How Semantic Technologies Work (and how your business can benefit from them)
It’s easier to explain how something works when it no longer does. The reason for this lies in an obvious fact. When everything works as it should we forget about the effects and tend to focus on the mechanics. Because the system in question delivers what it promises we take its function for granted. As a result the “what” is conveniently overlooked and we focus on the “how”.
Let it break down at any point however and suddenly we become acutely aware of what it is that it actually does. Email, which is terrific in the way it breaks up messages at the point of origin, transmits fragmented bits over the internet pipes and then reassembles the message at the point of the receiver is amazing until it stops. Then we suddenly realize just how huge a chunk of our business relies on emails getting through to us immediately.
It’s the same with cognitive computing and semantic technologies, terms that are increasingly interchangeable. When employed correctly cognitive computing (which employs Machine Learning) takes masses of raw data and turns it into usable information by assessing the importance of each piece in relation to all the other pieces around it and then weighs the importance of a cluster of connected data in relation to all the other, similar clusters found on the web. The result is that answers are produced that closely approximate what a person would be able to provide had he had access to all the world’s information and a brain the size of a planet.
Not As Easy As It Sounds
What sounds easy to explain is hard to do. For a start the algorithms that do all this have an accepted fail rate that in the best case scenario is around the 5% globally. But the global accuracy picture does not take into account what happens when the data required to cross-check and cross-reference the veracity of the connections is not there.
To illustrate what I mean consider what happens when I turn up at a conference on Big Data and call myself a Data Scientist. Because I play to stereotypes and want to live up to expectations, I have the impressive name badge, the clipboard and the slightly odd professorial attire. To clinch the deal I have also a presentation running behind me and have paid 50 friends to turn up and tell everyone who I am.
In that environment I am a data point. My attire and presentation are my primary footprint and my 50, paid friends are my connections. Anyone entering that environment has no reason to suspect I am lying and no good reason to challenge me on what I am purporting to be.
But a Data Scientist is not a point of data that works in a vacuum. You would expect to at least find a business I am working with that independently verifies my expertise and title. A publication or two. A book maybe. At least one paper. Other publications, excerpts, comments, interviews and appearances that indicate that yes, I am who I say I am and I do what I say I do.
Should there be a doubting Thomas in the audience (and in this case he plays the role of a search engine bot) all he has to do is Google my name to find all the connections, reviews of my books, citations and mentions.
This is what cognitive computing does when it comes to information. Not only does a spider of some description check to see the complexity and veracity of the immediate web that the presence of interlinked data has created but it then checks to see its history across a much wider spectrum of information.
The 4Vs Rule
Data has a life that is governed by the Big Data concepts of:
Taken as a whole all four of the 4Vs represent a living, breathing piece of data (or datum to be a little pedantic) which, once we get past the metaphorical phase, suggests that the data actually has impact. People are interested in it. It has relative importance and therefore it has some degree of existential truth (which is where the Veracity component comes in).
Lacking that (which is what happens in my closed-world example above) holes develop in the capacity of an algorithm to truly understand what is happening. Its assessment of the situation may show that it is a case where trustworthiness may be questionable but beyond that it cannot really suggest anything.
The weakness here is in the conjecture. While humans can very quickly draw from their understanding of society and its structures and the possible pitfalls and suggest a motive in the overt absence of evidence of trustworthiness, an algorithm can only present the next ‘best’ answer it has available and that usually is never good enough.
How Does Google Do Map Semantic Connections?
Google used to use Google+ and the web at large to track individual posts, link them to websites and personal profiles, map sentiment in comments and compare it all with past profile activity and textbook ‘signature’ styles to see what is real, what is not and what is somewhere in between. It continues to do this across the wider web using machine learning technology to provide it with the only cost-effective means to do so.
Given the ability of computers to do everything faster and better and their capacity to never forget it is easy to imagine that there is an always-on, omniscient mega-machine keeping tabs on everything and everybody and assigning some kind of ever evolving numerical value to everything. Clearly, this is not the case.
The reason lies in both the amount of information that is released every moment on the web and the computational power required to keep tabs of it all. Even a company as big as Google requires some kind of shortcut to make sense of it all and those shortcuts lie in trusted entities. The problem is it takes a long time to develop trusted entities that are in the same class as say Wikipedia or the New York Times. With time this problem will be a little smaller though the amount of fresh data released on the web will only grow.
We Are The Final Link
The final link in the very long chain of processes that make information be true or false on the web, is us. Ultimately our activities, shortcuts and transparency become key to maintaining veracity across the web and while we may not be quite to the point where everyone is accountable for their actions and feels responsible for what they post, by degrees we will get there. Particularly as the divide between online and offline is being continuously bridged, first by our mobile devices and now by the advent of virtual reality and augmented reality connections.
What Marketers and Businesses Need to Know
There is good news in all this for both marketers and businesses. If you’ve already got a copy of SEO Help then you’re ahead of the game and are already reaping the benefits. If you haven’t however you need, at the very least to do the following:
- Create data-density to your online presence that at least matches your offline one.
- Find an audience. That means that on the web you need to engage. Do not just broadcast.
- Define your identity. If a guy selling cronuts can do it, anybody can.
- Think like a publisher. In Google Semantic Search I explained how now, none of us have a choice. Just like opening up a shop forces you to become an expert on window displays, color psychology and lighting, operating on the web requires you to know what works in terms of text, pictures and video.
- Be personable. If your ‘voice’ and identity do not come across then people are unlikely to want to engage with a blunt, corporate sounding machine.
- Be real. Acknowledge faults and weaknesses and work to set them right.
These are minimum requirements and each takes a lot of effort to get right. But then again something that requires hardly any effort at all is unlikely to hold much value in the eyes of the beholder which means it will not really get you anywhere.