Five Steps To A Fully Semantic Web

Google's Knowledge Vault requires five steps
Semantic search is the biggest technological transition you are likely to encounter this century. Because it requires the correct classification and categorization of data it becomes the lens through which we see the world around us more clearly and in greater detail than ever before.

The video below explains the transition:

 

Google is deeply committed to making search scale across the web, at first, and then the world. This is only logical. What search does is provide facts and the online world is an abstraction, of sorts, representing specific aspects of the offline one. Then it stands to reason that a clearer understanding of the latter will lead to a much more nuanced understanding of the former.

Search, as it stands right this moment, understands the online world way better than it does the offline one. This is about to change.

To achieve this change Google has initiated five, distinct steps:

Step 1. From Strings to Things: This is Google’s Semantic Search. This is where Hummingbird comes in Essentially this is the transition of search from understanding what you seek as part of a probability matrix that matches your search query to actually understanding the meaning and intent of your search query and serving search results that best answer it. Google set out to achieve this through a variety of methods that included the social signals and interactions of people across the social web. The digital identity service that is Google+ and the claiming of ownership of content from across the web that was behind authorship.

Google used crowdsourced information from trusted sources like Freebase and Wikipedia to help it accumulate the facts it needed to build its Knowledge Graph. In the official Google blogpost that introduced the Knowledge Graph there is a paragraph that states that “Google’s Knowledge Graph isn’t just rooted in public sources such as Freebase, Wikipedia and the CIA World Factbook. It’s also augmented at a much larger scale—because we’re focused on comprehensive breadth and depth.”

Google had, for some time, been experimenting with ways to pull facts from documents across the web and provide direct answers to questions in the Google Q&A service that was discontinued in July 2014. The knowledge and experience accumulated from that service has not been lost.

Step 2. Reading the web: At the very heart of this lies a project that has been labelled internally Google Brain. It’s really a means of using machine learning to map and then understand the relational value of the connections between individual data points. It sounds like an abstract maths problem but that is actually how we understand the world and get smarter ourselves so Google really is building an AI (or several) and putting it to work first within the company and then, externally.

Reading the web uses content scraping techniques to extract information from every page Google has indexed plus all the additional information that is being generated on a daily basis (photographs, videos, comments). This mines an entire collection of facts that Google then assesses for accuracy and filters through “prior knowledge” (i.e. trusted databases of known facts) to further increase the accuracy of the facts it mines.

When you have a lot of accurate data at your fingertips a surprising effect takes place: you can make predictions for the future that are a lot more accurate. Recurring patterns reveal themselves, certainly, but you also get to see trigger points. It’s a little like knowing your cat will go up to your dog when he’s sleeping and slap him and run away with him chasing her. Accurately predicting that scenario not only requires knowing recurring patterns in cat and dog behavior but also having access to the proximity trigger points required to make it happen in the first place.

Google Now is a classic (if somewhat limited still) example of this kind of ability, the predictive algorithms of YouTube that preload videos we are likely to watch so we can have a smooth viewing experience on mobile, is another.

Step 3. Asking the web: Not everything online is easy to find and understand. There are many missing attributes even after steps 1 and 2 (above) have been taken. To help fill-in the blanks, Google implements web-based questions and answers. The thing to remember about the web is that you always get an answer so the key is not in asking the question but in how you frame it and how many you ask. This is like Douglas Adam’s famous answer to the meaning of life: 42 where the answer was known but the question itself was missing. Google search (and the queries input there) is one example of one source for a Q & A system.

The approach was described best by a Microsoft Research paper where it was stated that “Web-based question answering systems typically employ rewriting procedures for converting components of questions into sets of queries posed to search engines, along with techniques for converting query results into one or more answers.”

By selectively applying key concepts of information retrieval, information extraction, machine learning, and natural language processing (NLP) Google gets to the point where answers can be obtained from a corpus of documents on the web, to answer a ‘simple’ question.

4. Asking People: Google’s Knowledge Graph has a feedback link that allows people to correct mistakes it makes. Google creates a user contribution history file that allows an algorithm to assess the probability of a contribution being right or wrong, based on past history. This is similar to many of its other products. Google uses other strategies like quizzes it shows to specific types of people who, by choosing to take them, supply further answers. The critical point with this approach is figuring out which people to ask, what.

5. Open issues: These are far from solved. Google knows that new entities and new relations are being created all the time. Discovering them, verifying them and cataloguing them correctly is a task that’s not yet solved. In addition there is the added difficulty of sorting real from fictional contexts: Abraham Lincoln, the 16th president of the United States, is also Abraham Lincoln, Vampire Hunter and the difficulty of extracting information that is implicitly stated in web pages (like narratives that allude to facts) and the task of constantly assessing the trustworthiness of sources.

The Venn Diagram of Semantic Search

There is an overlap of practices and methods employed in the five steps above which theoretically, in their totality, should automatically filter out low quality, low-confidence content and create high-trust facts which can then be used to understand the world better. The end game, should Google succeed in getting there will be a knowledge base of facts the likes of which we have never seen before, powering services that will range from digital personal assistants to predictive services of all different types.

Along the way we are going to see all sorts of hiccups as privacy concerns and fears about the technology begin to crystallize. This is brand-new territory and we really have not formulated the questions to ask, much less the answers we need to hear. 

What You Need to Know

All this is grand, of course. So, if you made it to here you now need to know what you should be doing to benefit from all this change. A few things:

  • If you are a webmaster you now really need to be everywhere not just on your website.
  • Being everywhere will soak up tremendous amounts of time so be clever with your content. Create the kind of content that so totally resonates with your audience that they become your best advocates, willing to amplify your digital footprint across the web.
  • Be data-rich in what you do. Put in as much information as possible about everything.
  • Use every product available to you: Google maps, Google+ Twitter, Facebook, Gmail, LinkedIn, YouTube, SoundCloud, Pinterest and so on. Join all these by cross-referencing them where possible.
  • Make your website your hub. Fully. All roads should point to where you best control your content.
  • Find your audience. Your real audience. Don’t just use the web to market to everyone hoping to statistically reach the right audience. That way of marketing is now long gone. Use the web to find those you really want to connect with and make them passionate advocates that will bring you more customers.
  • Be real. Stop marketing altogether. If by marketing you understand the shaping of an online presence that assumes a persona, forget it. Be you. Communicate your set of values. Find those who are into them. Become a community of sorts. Grow together.
  • Use video. This is usually forgotten until an ad needs to be created. People are always nervous in front of a camera. Well, that’s fine. Video is incredibly powerful in getting others to know who you are. Use it. Connect.