Language is a representational code in semantic search

Language is nothing more than a form of representational code. It allows us to create a shared framework of concepts, ontologies and hierarchies and talk about them and the entities that make them up in a manner that makes total sense.

So, in a very real way language allows us to codify our perception of reality and share it to such a degree that we can function in the world that the language describes. Because the world is information language is key to our ability to shape it. Search, in turn, is all about discovering, indexing and then retrieving information which why it has become the way through which we uncover more of the world than would otherwise have been possible for us to physically access.

It is fair to say that search and language are intimately linked. Semantic search mirrors the way we create, understand, retrieve and share concepts in our own brain which, in turn, makes search the interface at which the real world and our ability to adequately describe it, meet.

When Google first introduced semantic search it entered the world of language as code which means that search queries now had to be understood in terms of concept, knowledge, prior action and intent filtered through context. The problem with this is that language is imprecise and constantly evolving and nothing about it is exact, which is why context is important.

In the four years (to date) since Google Hummingbird was introduced the one complaint I constantly hear is how imprecise search is on specific search queries. I get it and it’s a complaint that is not going to go away any time soon. Understand that we expect more and more of search as we use it differently (a little more about this in a moment) and we also do more and more with it as we use it everywhere. As our connectivity is increasing and our shared cultural experiences provide a common framework of communication, we develop ever more nuanced ways to communicate feelings, ideas and thoughts (consider how the internet used a stock image to create an entire shared history of a couple’s imagined love triangle that went viral).

Despite constant advances in machine learning and indexing the fact that search is struggling to cope is evidenced by the very real concern we have, at the moment, with fake news. The reason search is always a little behind lies in the complexity of the crucible created by technology, users and the world at large (what I sometimes call the semantic search sandwich).

All three are intricately linked. As technology can do more (i.e. in this case index more of the world’s information, better) we use it to do more with constantly pushing against the boundaries of its capabilities. As we use it to do more with, the more we change the world we operate in, creating more information and altering the very environment we function in and which search indexes.

Consider this simplified example: In the not too distant past, all I would have had to do to get my website to appear high in search would be to place some keywords in the website URL, use some keywords in the H1 and H2 header tags. Strategically place some keywords in the body text of the pages I wanted to serve in search and create a few links featuring the keywords I wanted the page to rank for in search, in external websites pointing to mine.

Google search looked at keywords and keywords represented nothing beyond themselves. In that old model the representation of the world in the words had been disconnected. All that mattered were the words themselves. This is not the case any longer.

In learning about the world search, inevitably, has to learn to speak every language man speaks. Now, this is not quite as hard as you might think it is and when Google’s machine learning algorithms achieved zero-shot translation they proved that the process is accelerating and, at times, leapfrogging expectations.

But it’s not just search that’s changing. Searchers are too and so is their behavior (the thing about using search differently I mentioned earlier). In the past searchers had to learn the ‘language’ of search in order to get anywhere (we learnt about keywords and, some of us, about search operators). We had to filter through websites that spammed search in order to find what we wanted and, when the initial search failed, we had to try and guess what a website owner whose sites might have the information we sought might have used as keywords. We did, in short, all the heavy lifting because our tech couldn’t.

We live in different times. Today, we use search on mobile devices using voice, for example. And we talk in vague concepts rather than keywords fully expecting answers on Google search most of the time. We expect search to know roughly what we want based on either our past behavior or that of others in similar contexts and deliver it to us seamlessly. We get frustrated when information is slow getting to us or of the wrong kind.

Similarly, webmasters have had to wise up. They had to decide on how to structure their websites and the information they put on them. They learnt (hopefully) to create bodies of work that were thematically linked. Their content answered human needs instead of acting as a signpost for search engines. They too, in short, started to behave differently.

All of which brings us to Greeklish. A bonafide language (of sorts) which uses transliteration to create a common understanding of Greek amongst people who may speak Greek and not write it (and I admit to using Greeklish countless times with my Greek friends).

Consider what’s at play here. In writing Greeklish, Latin characters are used to recreate sounds that represent agrammatical versions of words which then need to be understood in their correct spelling and context (much like a red traffic light means: “Stop! Danger! Wait here until cleared.” to any motorist, anywhere) so that a specific concept can be communicated accurately to people who are native Greek speakers.

Google Gboard now has a Greeklish setting

By cracking transliteration Google’s understanding of language has made one more inroad in concept-building. Better concept-building means greater understanding of nuanced language which also means a clearer grasp of search queries in instances where there is some ambiguity about the intent or the context (as in “Coca Cola Light”).

Disambiguation between Coca Cola Light and a Coca Cola Light

What Does All This Mean for Search?

In a way I am only reiterating things I have said many other times before. Entities are key. Concepts are central. Context is crucial. Content that in some way does not address the needs of any of these three (or, ideally, all three) is thin content that is likely to cause more problems for your website’s ranking in search than it is intended to solve.

Machine learning is playing a larger part in search in terms of concept mapping in language which means that we are approaching the velocity necessary for search to respond to semantic changes with a smaller lag than it does at the moment. This means that false positives (websites that are flagged as spammy or suspect when they are not) will decrease, content match to search queries will increase and even false news, to a small extent, will drop.

If you’re running a business and using the web to attract customers you really cannot afford to be haphazard in your content creation or be relaxed in your website structure, confident that a smarter search will make heads or tails of it (it won’t). The smart thing to do now and always is to create content that is always valuable and strive to create clarity of reason and intent (your “why” and “how”) through that content.

Don't get caught out. Pick up two books to guide your online search marketing and branding:

Google Semantic Search: Search Engine Optimization (SEO) Techniques That Get Your Company More Traffic, Increase Brand Impact, and Amplify Your Online Presence

SEO Help: 20 Semantic Search Steps that Will Help Your Business Grow

Spread the buzz. Share this post with those you think it will help.