Google's Knowledge Vault and AI

We have such high hopes pinned on semantic search because at the execution interface it feels like the ultimate kind of magic: we ask questions we really want to know the answers to, it gives us answers we can trust.

Behind the scenes that magic is underpinned by the struts and pulleys of the 4Vs (Volume, Velocity, Variety and Veracity) and the constant appetite for data it can verify and interlink.

The problem here is two-fold. First, semantic search can only scale at the speed of human interaction and engagement. The bot aspect of it is restrained to the mapping and cataloguing of the data that is generated by human activity. To achieve that it relies on community supported tools (Google+, Freebase and Wikipedia). Second, the veracity element of our activities is problematic. Beyond the fact that Google needs to be able to understand and analyze sentiment it also needs to be able to tell fact from fiction and that is not quite as easy to achieve as you might think.

Semantic Search’s Oracular Powers

Of course, the moment you begin to crack the Veracity issue you suddenly have a motherload of information at your fingertips that no one else has ever had before. Search then can become both trustworthy and predictive in nature with documents on the web given a trust weighing based upon the mapping of inter-related facts.

The catch? You guessed it, speed. First, the web scales incredibly fast in terms of the data that’s put in it every second of every day. Second, the mapping of crowdsourced interactions and engagement is necessarily slow: Humans cannot be on the web 24/7. We do not go everywhere. We do not check everything. If anything our web activities, over time tend to become somewhat circumscribed, falling into regular patterns that mirror our general behavior. This makes us predictable.

That’s great as far as predictive services, like Google Now, are concerned but not so cool when Google relies on us to expand its data mapping.

Enter Google’s Knowledge Vault

In a paper on the subject authored by Google researchers Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang the abstract mentions that: “The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilistic inference system that computes calibrated probabilities of fact correctness…”

Essentially this is a scaled up Google version of the fact-checking algorithm IBM has patented that has two components at its core that are both simple and ground-breaking: A. A knowledge fusion capability and B. An evaluation protocol. The first takes existing facts within graphs (like the Knowledge Graph, for instance) and uses bots, instead of humans, to cross-reference and evaluate associated data points.

Because, unlike humans, bots do not sleep, can go everywhere and can be online all the time, the entire process of creating an index of semantic entities increases in speed, exponentially. In short this is semantic search on steroids. For a start this makes the predictive power of our personal digital assistants viable.

It then also promises to show us aspects of the world that are now closed to us because the effort involved is unwarranted for as yet uncertain returns. Take for example the question “How has the gender gap changed in French politics over time?”. Google’s Knowledge Vault bots were able to answer this quickly by combining a knowledge base called YAGO (which holds information on the gender of every French politician) with data from French newspaper Le Monde (where names and mentions could be extracted from the text). This suddenly allows us to see how French politics evolve along gender lines and begin to analyze the impact that has on French society as a whole.

Semantic Search on Steroids

With that kind of tireless, always-on data mapping the creation of structured data indices that powers semantic search becomes a case of resources (i.e. time, computing power and storage capacity) rather than one of having to entice the online population to become more active and more adventurous in their online surfing.

The benefits are that wearable devices that can provide an augmented reality experience are suddenly, also viable. The Knowledge Vault, in short, is an acceleration of Google’s efforts at semantic search by pooling together existing knowledge databases and interlinking them. This then makes the ‘knowledge gaps’ that much more visible and easier to fill in.

This is how this will affect your marketing:

  • Have clear, detailed connections on your Google+ Profile “About” section linking to all your other social media profiles across the web.
  • Have really detailed data about you in all other social media profiles.
  • Be real and detailed in your interactions and engagement across the web.
  • Deliver real value in everything you do because that leads to connections and that leads to relationships. Relationships, in turn, lead to mentions, citations and engagement and these help in fact extraction (about who you are and what you do) from across the web.
  • Cross-reference activities. If you are active in a social good activity, mention the context, link back to your business and vice-versa. Compartmentalizing information only leads to fragmentation of your digital identity and lower trust scores.
  • Don’t forget your website. If your website is not up to date, active and detailed you are creating a black hole instead of a data hub around which your digital presence can revolve.

In a presentation on where all this is taking us Kevin Murphy from Google Research shared the fact-extraction mechanism Google uses to pull out facts from data across the web:

Google Fact Extraction from the Web

Google is in the process of implementing a five-step plan that will take semantic search into an entirely new level:

  • From strings to things
  • Reading the web
  • Asking the web
  • Asking people
  • Open issues

How each of these will affect your marketing is discussed in this article here. For now it is enough to know that semantic search is going global, bigger, better and faster than before. That data density (i.e. putting in as much detail as possible) is key to everything you do and that, yes, everything is connected.


Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion
From Big Data to Big Knowledge – Google Research