Data is only as good as its reliability

Information retrieval (more commonly referred to as search) has always revolved around data. The way it is indexed, categorized, segmented, re-classified, served, repurposed, abstracted and used reflects, exactly, the end user’s experience with it.

It is a fact that unprocessed data will generate unsatisfactory responses to search queries. It is also a fact that any ‘frustration’ with search (beyond its obvious fragmentation) always revolves around insufficient amounts of data being available to accurately understand and fulfil a particular search query.

Google, in line with every other digital company, tries to minimize the frustration by capturing as much data as it is humanly possible and serve it in response to its understanding of a specific intent. There are two really important things to understand here: responses to search queries are answers to a perceived intent. Perception is formed through memory, experience and knowledge. Intent is signalled through human behavior.

The way perception and intent work at the search engine interface is no different, in principle, to the way the two attributes work in a human-to-human interaction. The machine learning environment of search however introduces its own subtle issues because there is no human nuance involved. What we have instead is an algorithm acting as a rational agent perceiving the environment by some sensors and looking at the old actions that it has already executed so it can choose the next action that will deliver the best possible outcome.

Smart systems have an Achilles heel that springs from their machine intelligence and can, at times, render them dumb.

Consider the case of using Google maps. Our use of a map is itself a search query: we’re looking for a place and a fast route. Using the principles guiding perception and intent the smart algorithm that delivers routes to us on Google maps reads available data and tries to give us the fastest route not just in terms of distance but also time. Because it relies on pure data for that it can be fooled.

Simon Weckert, an artist, walked the streets of Berlin tugging a red wagon behind him. On that wagon were 99 Android phones. For those of you who see significance in the number, the color red of the wagon he pulled, Weckert’s nationality and Nena’s 1983 global hit song 99 Luftballons remember how perception works above and consider that millennials have to be explained the reference in order to see it because Nena is usually outside their memories and experience and maybe, even, knowledge.

Weckert’s phones, with the Google Maps app open, broadcast a signal that Google interpreted as a traffic jam and used it to update its Google map details accordingly. As Weckert said in his Motherboard interview: “…I’m able to generate virtual traffic which will navigate cars on another route.”

Data Is A Representation

In search data is representational. This means its existence is in parallel with the real-world objects and situations it depicts but also distinct from them. Its arrangement, classification and interpretation add an additional layer of abstraction that needs to be deciphered.

In Simon Weckert’s experiment the data from the 99 Android phones transmitting to the Google Maps app delivered Volume, Variety and Velocity with Veracity (in this case) being taken as a given. The result was a hacked app that produced real-world impact. Google Maps, thinking this was a real traffic jam in the city, could re-route users and create traffic jams elsewhere in the city.

Human trust in data is, in most cases, implicit despite the fact that we should be running our own rudimentary checks and controls. Human trust in machines is explicit. Both of these can be problematic.

The Takeaways For Marketers and Analysts

Consider the paradox: the data in Weckert's experiment didn't lie. It wasn't even unreliable. As a matter of fact it worked flawlessly doing exactly what it was supposed to do. The only problem was that it had been subverted and what it was showing in the digital realm by way of interpretation had been disconnected from what was really happening in the real world.  

Data only tells part of the story. For, instance, data that shows our use of a map reveals our intent to travel somewhere and a result that provides the fastest route is welcome. But searching for a product online is much harder to quantify without a lot more data to provide context. If you’re serious about using search and data-driven, evidence-based marketing to further your brand you need to:

  • Understand what the data really tells you
  • Verify by spot-checking to make sure there is no ‘glitch in the machine’
  • Acknowledge imprecisions in data and its interpretation; when they occur
  • Use the data you generate in your marketing to address the three points above

Keep in mind: data drive search and search is marketing.

###

Go Deeper: 

Intentional book by David Amerland The Sniper Mind by David Amerland
Take Control Of Your Actions.    Make Better Decisions.