Thursday, June 26, 2008

Bill Gates Speaks Up For Semantics

Bill Gates’ criticism that pure keyword search is “just syntax and not semantics and has limits no matter how much you build those things up” came at the heels of a heated conversation I took part in at the Semantic Technology 2008 conference in San Jose. The session title was: Will Semantics Give Web Search a Face-lift?

It was clear from the outset that very different notions of semantics were used, so a lively discussion ensued during the Q&A, where the panelists compared their own approaches to take search to the next level. Since everyone belongs to a different school of thought, we agreed not to agree: Fernando Pereira, Research Director at Google, assumes that semantics can be captured from the use and formatting of language–ironically, he later stated that Wittgensteinian (meaning is use) or Fregian (meaning can be reduced to formal logic) approaches are futile. Google’s approach is using classic statistical machine-learning methods (robust, in the sense of a brick being a robust tool for switching off a light, but as we know non-scalable), so we know that there is no “semantics” focus. Peter Mika, a recent hire at Yahoo!, on the other hand, talked about their new SeachMonkey interface that is, inter alia, to be fed by RDF markup. Obviously, hakia’s position is rather different.

Keyword Co-occurrence Statistics as Semantics (Google)

Fernando killed the light with a rock.

If meaning is co-occurence for you, then this sentence will be a possible answer to queries about people dying in avalanches. Not much that structure of a webpage could help you here, either. Seemingly relevant words, not disambiguated as to their actual senses in the given context, will easily mislead you.

Syntax as Semantics (Powerset)

It should also be mentioned that Ron Kaplan, CSO of Powerset, made a few statements from the audience, including the very telling one that Powerset believes in the “syntactic web”, which pointedly illustrates his belief that you can get to meaning from surface syntax.

If meaning is syntax, then for you the sentence above is not distinguishable from this one:

Ron killed the program with a memory leak.

The surface structures of the sentences are identical, even some words overlap, but killing a light is different from killing a program (not to mention, killing an animate being), and the ‘rock’ is an instrument in the first case, while the ‘memory leak’ is a cause in the second. Syntax does not grant you access to any of these important differences in meaning.

Semantic-Web Markup as Semantics (Yahoo!)

If, on the other hand, you believe in semantic-web-style markup as the solution, then the author of the sentence will have to add tags that clarify that a lamp was switched off, hopefully in a way that another user has tagged this sentence:

Peter used his usual brick to turn off the lamp.

Semantics as Semantics (hakia)

If, finally, you have access to semantics, your constraints on the different senses of ‘kill’, ‘light’, and ‘rock’ will get you to the meaning automatically, and you will serve the sentence above only as an answer to queries about methods to switch off lamps, and not pollute your results with it otherwise. For more examples, you can read my prior blog posts.

Where we currently are in search is nice, but there is much room for improvement. Non-semantic methods have reached their ceiling. Carefully tested and appropriate semantic methods, based on understanding natural language, will get us to the next stage. We are phasing these in, beta release by beta release, and will show you the difference between real semantics and yesterday’s attempts at avoiding semantics. Stay tuned!

No comments: