Visualisere

Sunday, 18 July 2010 22:41

An introduction to searching

In this article I address the splitting up of searches, and discuss why a good search engine is important. By dividing the ongoing search into “Implementation” and “Presentation”, the developer and the designer can work with what they do best, which hopefully will result in the best possible result for the users.

The importance of a good search

If a customer cannot find the product he is looking for, you lose a sale! If this happens several times, you will soon realize that bad searches can be expensive.

Often internal net searches are defined by using three different categories, characterized by what you are searching for.

  • Explanative: You are not quite sure what you are looking for. You are “browsing”, and hope to get interesting hits. For example, you might be offered the possibility of making suggestions and get search assistance. Typical searches could be finding out more about Java programming or similar things.
  • Limited: You almost know what you need. However, you need help to define the results.
  • Exact: You know exactly what you need, however, you need to be able to put in the information that defines what to find. For instance, if you have downloaded a preview image from iStockphoto.com, and you want to purchase the original later on. Then you know exactly what you are looking for, and can look it up again by using the file name.

What is a good search?

  • The search must support writing errors. For example, the search “devellopment” must yield approximately the same results as “development”.
  • The search must have a good nil search error, so that the user can easily move on. Not unlike the process of making self-defined 404 error reports.
  • The searches must be logged, the logs must be analyzed, and if required, the search must be adjusted.
  • The search must have filtering possibilities adapted to the user. Bear in mind that you could easily get too many choices.
    k2_search_result

Implementationn VS Presentation

An article like this would normally emphasize that a search consists of two separate parts: the implementation of the search engine and the presentation of the search results (including the possibility of setting up a search string + limiting the search results.)

The search implementation can be performed in very many different ways, and some of the defining factors are, among others, relevance, performance, and scaling.

Presenting the search is about developing / using an interface that intuitively permits you to advance to a relevant result. The search result page is not the aim of the search, however, the article (s) that are relevant enough for the user to click on are.

During the presentation, people want to be able to search in the same way they are used to, by using Google, Bing and Yahoo, to mention a few of the largest search engines.

  • Searching for a single word: web trade
  • Search for several words: website, web trade, publishing solutions
  • Search for parts of a sentence: “free website”
  • Random signs: web*
  • Logical operators: Webdesign OR Web development AND Java
  • Include words in the search result: Web development + Java
  • Exclude words from the search result: Web development - Java.

Some implementations

  • Apache Lucene / Solr: Lucene is a library for searching and indexing. Solr is an enterprise platform for searches using Lucene. Solr is run as a servlet under i.e. jetty, tomcat or glassfish. Lucene and Solr are written in Java, however Solr communicates (post/get) via XML, so that you can use any other language on behalf of the client. Among others, Solr is used by digg.com, monster.com, linkedin.com and whitehouse.gov. You can see more users of the Lucene / Solr at the showcase wiki.
  • My SQL Full-text search options: Makes it possible to more quickly search a “SELECT name FROM test chart WHERE company LIKE ‘%mbmedia%’;” By using a full-text search, we would also receive the result set sorted by relevance. Relevance is defined as a decimal number, where 0 is not relevant. In order to write a search in php, we only need to get the search phrase from the user, build up the database questions, and classify / present the results for the user afterwards. This method can only be applied to MyIsam charts, and the field types CHAR, VARCHAR, or TEXT. You can read more about full-text searches at the mysql website.
  • Gir oss muligheten til å søke raskere en "SELECT navn FROM testtabell WHERE firma LIKE '%mbmedia%';". Ved å bruke full-text search vil vi også få resultatsettet tilbake sortert etter relevans. Relevans defineres som et desimaltall hvor 0 er ikke-relevant. For å skrive et søk i php trenger vi ikke gjøre stort mer en å hente inn søkefrasen fra brukeren, bygge opp databasespørringen og sortere / presentere resultatet til brukeren etterpå. Denne metoden kan kun brukes på MyIsam tabeller, og felter av typen CHAR, VARCHAR, eller TEXT. Les mer om full-text-search på mysql sine hjemmesider her.
  • Sphinx: Supports MySQL, PostgreSQL and other ODBC databases. Consists of indexes, daemon and a command line search tool. Sphinx is used in C++ and runs with Linux, UNIX, Microsoft, Windows, and Mac OS X. Among others, Sphinx is employed by craiglist.org, diskusjon.no, thepiratebay.org, and joomla.org.

A few ways to present search results

  • A few selected search boxes, accompanying advanced functions, and search results. Not unexpectedly, there are many ways to approach search presentations.
google_enkelt google_avansert google_resultat
istockphoto_enkelt istockphoto_avansert istockphoto_resultat
joomla_enkelt joomla_resultat

  • How much of the content do you wish to show? For internal searches it might be a good idea to show enough for the user to decide whether the result is right or wrong. When it comes to external search engines like Google, Bing and Yahoo, too much text / information rendered in the search result would result in that the user might not click on the result even if it is correct. The reason being that he has already got the required information in the search result. This results in fewer hits on the article author’s page, and some people might conclude that the search engine is stealing information from the article’s original website.
    google_vaer


  • Owing to user friendliness, search results are often listed in rows, with a row assigned to each hit. We can make the presentation more user-friendly, by grouping important information in columns, that go straight through the search result rows. This makes it easier for the users to scan for important parameters like i.e. price. Enclosed there are two examples, and many people will probably be of the opinion that the example on the left hand side is more easy-to-follow and to scan through. In the first example it is difficult to compare parameters, which is due to the fact that important information has not been arranged in separate columns.
jed_searchresult komplett_searchresult
Row-based. (JED)
Row-based, with columns to emphasize important parameters. (Komplett)
  • For internal searches it is important, and also possible, to adjust the presentation according to existing content and users. Use this possibility, and do not necessarily present the most generic interface at hand.
  • The title /description in the search result must be relevant for the resulting article.
  • You can optimize searches that are often repeated.
  • Do certain searches stand out rendering constant no hits? If so, you can put in contents that meet people’s requirements. For example, if you have many searches (and zero hits) for “how to improve my internal search”, you could design an article informing about just that. Then the next people searching for the same thing, can actually get the answers they need.

Summing up

Originally, this article was intended to be about Apache Lucene, and our experiences with setting it up and implementing it on an example data set. During this process, I found out that this was the wrong way to start, and that a better way would be to implement an “introduction to search” article first.

Implementation and presentation lead up to the final result, which is each single user’s experience of your search. By dividing the development into separate parts, the designer can work towards the users and meet their needs at an early stage, by means of prototypes, mockups and sketches.

I hope to come back to more articles dealing with this subject, including the planned experiences with Apache Lucene and other possible implementations that we are currently working with.

Last Updated on Tuesday, 21 September 2010 19:44
 

Samarbeidspartner

Interaktiv 3D logo

Logg inn


Visualisere