>Why are we still talking about search?

>

Google is obviously pretty successful due in large part to its search service. Despite them not having the best social networking track record, it’s an activity that maps well to the social world. We search for things – our keys, a present, a holiday, our next partner, a restaurant to take a friend to – our next role – ourselves. Google doesn’t give us exactly what we want but that’s OK because we don’t know exactly what we want either. It’s useful to be given a few social curve-balls.

Despite us all mainly using Google or Bing (Yahoo, Ask etc. all gone now), new search engines e.g. Blekko still regularly surface with varying differentiations and significant funding.

What if, rather than vague searches (inevitably) generating prioritized yet potentially vague results (M:M) [in terms of question:answer]; we could ask a specific question that provided a generally accepted, precise result (or if that question had previously recently been addressed, a link to same) that we (and others) could refer to (to later justify our logic). What if we wanted – an actual answer (1:1)?

Consider – “Which is the largest State in the US?” The question is vague (by land or population?) also Google is mainly matching keywords rather than actually understanding the question (and of course we get many results so the full exchange is M:M). Actually, Google have been delivering factual answers to some queries for five years. It only works for some though (it doesn’t work in this case). Regardless, Google’s first result correctly tells us Alaska and additionally gives us comparison with other States and square mileage too. We aren’t guaranteed getting the same result when we query in future (so accountability goes out the window) but the answer is OK for most purposes. If we make the question “Which is the largest Republican State in the US?” we still get an answer but takes us a few minutes to manually cross-reference the links. If we ask “Which is the largest, happiest Republican State in the US?” we simply don’t get a usable answer.

Someone living in Alaska is worried about perceived rising Russian immigration and they want to know if it has risen in their adult lifetime. There are no obvious figures on the Web. They could tweet/email the Governor and (if enough others have asked broadly the same question) then he might post a response on his blog (if he has one) once his research team have arrived at some defensible statistics for him to refer to (assuming they can find some).

This simple/current scenario has challenges and inconsistencies on both question and answer sides. Here are a few:

1) People have to feel so strongly about the issue that they have to find time to interact – search for any similar questions/answers, write their email, monitor the Governor’s twitter-stream/blog for an answer. They feel as though they don’t want to put themselves forward unless others feel the same. Conversely, enough people have to use similar phrasing of the question in order to kick-start the answer process.
2) The answer (if it is given) is not explicitly tied to the question. It could be made on his blog, a successive TV appearance, in a press release or personal speech.
3) There is no disambiguation or prioritization of questions – meaning that there is just so much noise that the Governor or his team simply don’t have the time to answer.
4) The answer could be presented in different ways – perhaps to achieve political ends or simply because they are underused by Government e.g. taking Russian immigration over a twenty year time-frame may show no year-on-year increase but taken over a two-year time-frame it may show a very different story.
5) There is little opportunity to to-and-fro with the question and answer thread – blending anecdotal evidence, redefining terms.
6) It’s a naive voter with a question – asking a team of well researched people. Odds are there will be a soft soap answer as independent experts are not involved in the process to moderate.
7) Statistics could come from Federal or State government, commercial, charity or other bodies. Also there is unlikely to be any checks and balances e.g. does the US figure for people incoming match with the Russian figure for people outgoing?
8) Any presented figures will be difficult to understand (unless an intuitive and consistent graphical approach is used).
9) Lawyers may need to be consulted to define the term “immigrant”.
10) There is no accepted “end” to either the question or answer.

In business, we need – to commit resources to a business plan, to indemnify ourselves against litigious customers, to make our point in meetings and to provide defensible input to decision making. Consumers largely search for things (M:M) while companies have to query things (1:1), although as with social networking, there’s some overlap (M:1) in the middle. Companies lose out as they maintain a blinkered-view missing critical information not captured through their own processes. Consumers lose out since they spend so much time searching for things (and the Internet is so rich) that, like travelling, the journey becomes the destination.

Google Search Appliance has done very well in the enterprise. There is a need for a one-ring-to-rule-them-all “get things” function. Also, our work and home lives are merging and when the answer you present is (close enough to be) unanimously accepted and is able to be referenced in any future analysis of your decision – then it will always stand. Once you don’t have to think about the logistics of getting answers, you can concentrate on actually doing things with them (for both consumers and the enterprise). To do this, quite simply – we need a single answer, a 1:1 model that works for everyone – like search – a Q&A service.

We have “I’m Feeling Lucky” of course (M:1) but this is an indulgent lucky dip that actually loses Google revenue. The other way around (1:M) doesn’t work since it entails definitively stating the question e.g. using something like QBE or NLQ (both of which are hard to do right) but still getting a range of results. What other services are currently available? Facebook released Facebook Questions earlier this year. This is slated to be integrated into Community Pages (already includes Wikipedia content). It is developing but is still very social/free-text. Wikipedia is an encyclopaedia rather than a Q&A service. Yahoo Answers has more of a wiki/free-text approach. It probably is the widest used but is also known for being random and open to abuse. PeerPong is expert-focussed. Off-topic a little, for product reviews, there are Hunch, DooYoo, EpinionsMahalo (their Answers service is a bit wider focussed than just products though but still very social). There are mobile Q&A services such as Mosio and ChaCha. Quora (founded by ex-Facebook CTO Adam D’Angelo) appears to stack up against Facebook Questions and again is very social. Social Q&A in general is big. StackExchange is an open-source platform for developing your own social Q&A service. In the enterprise, Opzi is attempting to be a corporate Quora. MSFT shut down their offering last year. Qhub runs a niche hosted Q&A service. Some social ones such as Hunch have repositioned themselves as recommendation services.  Many just cater to niches e.g. StackOverflow/development. OSQA is a free open-source oneSeveral dot-com bubble casualties have refocused on discussion based Q&A e.g. Ask has refocused on a mobile Q&A service similarly Startups.com.There are other start-ups.

All of these services are in some way niche, most require an account (which will put off many) and most provide multiple answers (1:M), although collectively they will probably end reliance on reference libraries (and on the product side – magazines like “Which?”). They helpfully hand you a piece of the jigsaw rather than solving the jigsaw for you. As has previously been said “None of these sites are Google-killers. In fact they make Google stronger because the questions and answers often will be indexed – extending Google’s reach into the tail”. Quora in particular has been recently heralded as a killer Q&A service but asking it “What are the most effective ways to engage news audiences?” receives a bunch of people’s opinions; whereas we could have a single list in order of effectiveness that has been produced through actual operational data and curated by either a Journalist or Statistician. If there were any dissention around the term “effectiveness” then that could be hammered out through social interaction. There wouldn’t be a need for a – discussion. As with all Quora questions, you need to read all responses to get a full answer but even then its just a subset of people that responded. Another example – RockmetIt better than Flock?

If the industry is framing Web 2.0 is the Social web and Web 3.0 as the Semantic web – it seems churlish not to leverage the powers of both in the Q&A service. The benefits of great data integration using common terms (Semantic) and crowd-sourcing (Social) in a Q&A service are obvious. By the same token, some answers will always require a (semi)/professional element to them (Expert) e.g. Someone asking a question about Alaska’s closeness to Russia (surprisingly – 2.5 miles) might get a quick answer but it will take an expert to plan how to get from one to the other through that particular route. Expert knowledge also comes into play around the presentation side – knowing which facts to present to most effectively answer questions e.g. the largest State question above – given to another search engine also gives Alaska but the square mileage is significantly less. A Geographer knows this is due to the second result not including water. There are other components but, in the main, both questions and answers have – social, semantic and expert elements.
Back to the Alaska question (“Which is the largest, happiest Republican State in the US?”). This requires both a semantic element (to determine the largest State) and a social element (who is happy?). If we make this “Which is the largest, happiest Republican State in the US liable to switch to Democratic?” then there is an expert element too.
Onto the proposed Q&A service – it needs to check a few boxes right off the bat:

1) Social. It would have to be based upon a popular, real-time social system. Despite perceived issues investing in their ecosystem, Twitter is arguably better than Facebook (or any other) for this purpose but it also needs to amplify and expand the reach of questions: present persistent questions that have undergone a process of disambiguation (both automated using semantic algorithms and manual through a public process of “backing” existing questions). Questions also need to be prioritized based on number of backers. Something like Kommons, Replyz  or maybe Formspring or Open Media could work here. Amplify is maybe too discussion focussed. The focus here is on directly (directly to the tweeter) quantifying the importance of a question and the validity of an answer. Social media monitoring like Crimson Hexagon could offer supporting services but they typically rely upon sentiment analysis whereas we really need direct engagement through voting or other mechanisms.
2) Semantic. We all need to be sure at the very least that we are talking about the same data. In short it will need an index. Google have Rich Snippet functionality but this is not a common approach adopted by most site ownersGoogle also acquired Metaweb earlier for this year for this purpose but something more open like Sindice would really work providing organisations have an incentive to cooperate (integration/ontologies/micro-formats etc.). Providing and articulating the incentive is the toughest piece of this by far. Then – we want an interface on the results (produced by the index) that allows us to de-select terms that we are not interested in so that the system knows for next time but also to publish the results of our tinkering and link it to the threads in the question or answer. Something like Sig.ma would do this.
3) Expert. We need data sets that are both selected and curated by subject matter experts. Curated answers can often be the most useful. Google acquired Aardvark earlier this year for this purpose but something like Wolfram Alpha or Qwiki would work. Involving Wolfram Alpha would add other benefits too – computational power (through Mathematica) to aggregate on-the-fly, leading NLQ support to translate questions, updatable widgets to support the presentation side and the semi-celebrity name of Stephen Wolfram attracting academic curators. All Wolfram Alpha content would need to be indexed and we would probably need to extend the publication side to be able to deselect returned facts as required (doesn’t appear to currently be available).

We are talking about potentially lots of social and semantic data needing to be agreed and presented. We’ll need simple data visualization with a basic level of interactivity so, with the example above; the user can slide the time scale to represent his adult lifetime. Something like Many Eyes, Hohli, StatPlanet or some improved Google Charts should do here. Also, we will want to store all our questions and answers forever so that we can reference them. Partnership with a very large EMC-like network storage partner will be required.

The resulting composite solution would be the most advanced Q&A service in the world. Q&A is so important and multi-faceted that it needs multiple solutions to work together. Its influence, if accepted, would be profound.

The service would drive out real answers (reference and topical) through a combination of social, expert and semantic approaches and actually referentially improve itself – as bringing social attention to the data would force organisations to improve data. Usability-wise – it wouldn’t hamper ad-hoc users by having a scary QBE type interface or by blinding users with spreadsheets of data. Conversely, it wouldn’t be so simple that it misses the point of what people want. It would be a natural extension to tweet activity (retweet, favourite, reply etc.). Opportunities for monetization would be rife: not least advertising with the solution’s wider collaborative user base, increased opportunities for interactivity and greater relevancy. New forms of credibility and relevancy would also become available. Journalism would shift to sourcing stories from datasets.

The traditional search space e.g. “Alaska Jobs” is already being eroded by location-based services. Google is certainly well placed to build a killer Q&A service as is Facebook with its obvious social advantages and semantic/graph plans. Google is confusing though – on the one hand, they have the right investments and infrastructure but on the other they still appear search/data/click driven – even imperiously so (“…designers…need to learn how to adapt their intuition”). They also have issues with innovation.

The big search players generally recognise current search solutions do not meet the needs of the user but they don’t have obvious solutions. There is an opportunity for innovation and for weaning people of this style of information seeking. There are smaller players that, if they cooperated now, could produce a more potent, compelling, flexible and lasting solution than any. Their biggest problem would be being creative about incentives for organisational involvement on the semantic side. Let’s hurry though – its hard going searching for things. We’ve got work to do now. We have questions. 

Advertisements



    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s



%d bloggers like this: