Ideally, a search engine would read the user’s mind. Shy of that, a search engine should provide the user with an efficient process for expressing an information need and then provide the user with results relevant to the that need.
From an information scientist’s perspective, these are two distinct problems to solve in the information seeking process: establishing the user’s information need (query elaboration) and retrieving relevant information (information retrieval).
Ideally, a search engine would read the user’s mind. Shy of that, a search engine should provide the user with an efficient process for expressing an information need and then provide the user with results relevant to the that need.
From an information scientist’s perspective, these are two distinct problems to solve in the information seeking process: establishing the user’s information need (query elaboration) and retrieving relevant information (information retrieval).
When open-domain search engines (i.e., web search engines) went mainstream in the late 1990s, they did so by glossing over the problem of query elaboration and focusing almost entirely on information retrieval. More precisely, they addressed the query elaboration problem by requiring users to provide reasonable queries and search engines to infer information needs from those queries. In recent years, there has been more explicit support for query elaboration–most notably in the form of type-ahead query suggestions (e.g., Google Instant). There have also been a variety of efforts to offer related queries as refinements.
But even with such support, query elaboration typically yields an informal, free-text string. All vocabularies have their flaws, but search engines compound the inherent imprecision of language by not even trying to guide users to a common standard. At best, query suggestion nudges users towards more popular–and hopefully more effective–queries.
In contrast, consider closed-domain search engines that operate on curated collections, e.g., the catalog search for an ecommerce site. These search engines often provide users with the opportunity to express precise queries, e.g., black digital cameras for under $250. Moreover, well-designed sites offer users faceted search interfaces that support progressive query elaboration through guided refinements.
Many (though not all) closed-domain search engines have an advantage over their open-domain counterparts: they can rely on manually curated metadata. The scale and heterogeneity of the open web defies human curation. Perhaps we’ll reach a point when automatic information extraction offers quality competitive with curation, but we’re not there yet. Indeed, the lack of good, automatically generated metadata has been cited as the top challenge facing those who would implement faceted search for the open web.
What can we do in the mean time? Here is a simple idea: use a closed-domain search engine do guide users to precise queries, and then apply the resulting queries to the open web. In other words mash up the closed and open collections.
Of course, this is easier said that done. It is not at all clear if or how we can apply a query like “black digital cameras for under $250″ to a collection that is not annotated with the necessary metadata. But we can certainly try. And our ability to perform information retrieval from structured queries will improve over time–in fact, it may even improve more quickly if we can start to assume that users are being guided to precise, unambiguous queries.
Even though result quality would be variable, such an approach would at least eliminate a source of uncertainty in the information seeking process: the user would be certain of having a query that accurately represented his or her information need. That is no small victory!
I fear, however, that users might not respond positively to such an interface. Given the certainty that a query accurately represents his or her information need, a user is likely to have higher expectations of result quality than without that certainty. Retrieval errors are harder to forgive when the query elaboration process eliminates almost any chance of misunderstanding. Even if the results were more accurate, they might not be accurate enough to satisfy user expectations.
As an HCIR evangelist, I am saddened by this prospect. Reducing uncertainty in any part of the information seeking process seems like it should always be a good thing for the user. I’m curious to hear what folks here think of this idea.