David Amerland

semantic search

  • Google Handwriting Input Changes Device Use

    Google handwriting and predictive search

    Google allows you to use you finger as an input device 

    It’s no secret that the easier it is to use something the more we use it. The more we use it the more diversified its use becomes in our lives. This is a principle that Google researchers understand well which is why the addition of handwriting recognition in Android devices in 82 languages is such a big thing. 

    Digital keyboards are fiddly and when typing on a small screen the potential for error is actually pretty high (one reason I find the Swiftkey predictive keyboard to be so useful). Google now does away with some of that by allowing input to be handwritten (yeah, messy as that may sound). 

     

    Google Handwriting examples in different languages

     

    For handwriting to actually be more than a gimmick as an input method there are some very specific requirements: 

    • Language modelling that can scale (Google has released their N-gram models to the world for anyone to use in research and development)
    • Accurate Optical Character Recognition capabilities. Google has long had a good track record on that.
    • Massive computation capability (and here Google is still ahead of any competitor)

    Cool as all this may be (and it is pretty cool) you also need one additional element that is a make-or-break deal: utility. If you have to be near a wi-fi or use roaming data in order to access a tool the moment you need it, the chances are you won’t. 

    Google has solved this by making the capability device-native, so it can work on your phone or tablet but also cloud-capable (if you want to take advantage of the cloud and Google’s servers). 

    The end result is pretty neat, the speed recognition of the characters that converts your messy caricature finger-writing on your phone’s screen to neatly typed words and letters is customizable from the app settings and for me who finds it next to impossible to write the @ symbol, Google figured out what it was and created the email address I was sending it to, in next to no time. 

    Google Handwriting input 

    Google has used the algorithms from Google Correlate to apply it to handwriting models that use a statistical predictive analysis pattern based on end-user data to find what the letters being written are in a given language (and there are 82 of them!). 

    The Semantic Search Question

    We’re getting to the point where “semantic search” is no longer a thing and really we should be talking about, just search. To pull all this off Google used nothing less than its ability to link up developments in different areas, powered primarily by search-related algorithms. All of this revolves around personalization and the development of technology to simply do things we want, rather than us having to figure out what to do with it. 

    Device Usage is Increasing

    There are several important takeaways here. Device usage is increasing. With that we can also surmise that the data that Google acquires also increases. This means better understanding of intent in searches, better understanding of query use (just like Voice Search changes the way we input queries so will handwritten text have an impact), better understating of context related to search queries, better understanding of device use in relation to intent. ll of this “better” all round stuff really means just one thing: search, marketing and branding (which are increasingly overlapping in their effect) are now governed by the basics of human psychology rather than the capabilities afforded by yet another set of tools. 

    This should also point the way forward. If you want your website and your online business to appear in relation to searches you need to understand what drives your potential audience to engage with you in the first place and position your content to meet, exactly that. 

    It ain’t rocket science. 

    Sources

    Translation-inspired OCR
    Nearest neighbor search in Google Correlate
    Google Handwriting Input in 82 languages on your Android mobile device

     

  • How Semantic Technologies Work (and how your business can benefit from them)

    Semantic Technologies still not quite complete at verification level

    It’s easier to explain how something works when it no longer does. The reason for this lies in an obvious fact. When everything works as it should we forget about the effects and tend to focus on the mechanics. Because the system in question delivers what it promises we take its function for granted. As a result the “what” is conveniently overlooked and we focus on the “how”. 

    Let it break down at any point however and suddenly we become acutely aware of what it is that it actually does. Email, which is terrific in the way it breaks up messages at the point of origin, transmits fragmented bits over the internet pipes and then reassembles the message at the point of the receiver is amazing until it stops. Then we suddenly realize just how huge a chunk of our business relies on emails getting through to us immediately. 

    It’s the same with cognitive computing and semantic technologies, terms that are increasingly interchangeable. When employed correctly cognitive computing (which employs Machine Learning) takes masses of raw data and turns it into usable information by assessing the importance of each piece in relation to all the other pieces around it and then weighs the importance of a cluster of connected data in relation to all the other, similar clusters found on the web. The result is that answers are produced that closely approximate what a person would be able to provide had he had access to all the world’s information and a brain the size of a planet. 

    Not As Easy As It Sounds

    What sounds easy to explain is hard to do. For a start the algorithms that do all this have an accepted fail rate that in the best case scenario is around the 5% globally. But the global accuracy picture does not take into account what happens when the data required to cross-check and cross-reference the veracity of the connections is not there. 

    To illustrate what I mean consider what happens when I turn up at a conference on Big Data and call myself a Data Scientist. Because I play to stereotypes and want to live up to expectations, I have the impressive name badge, the clipboard and the slightly odd professorial attire. To clinch the deal I have also a presentation running behind me and have paid 50 friends to turn up and tell everyone who I am. 

    In that environment I am a data point. My attire and presentation are my primary footprint and my 50, paid friends are my connections. Anyone entering that environment has no reason to suspect I am lying and no good reason to challenge me on what I am purporting to be. 

    But a Data Scientist is not a point of data that works in a vacuum. You would expect to at least find a business I am working with that independently verifies my expertise and title. A publication or two. A book maybe. At least one paper. Other publications, excerpts, comments, interviews and appearances that indicate that yes, I am who I say I am and I do what I say I do. 

    Should there be a doubting Thomas in the audience (and in this case he plays the role of a search engine bot) all he has to do is Google my name to find all the connections, reviews of my books, citations and mentions. 

    This is what cognitive computing does when it comes to information. Not only does a spider of some description check to see the complexity and veracity of the immediate web that the presence of interlinked data has created but it then checks to see its history across a much wider spectrum of information. 

    The 4Vs Rule

    Data has a life that is governed by the Big Data concepts of: 

    • Volume
    • Velocity
    • Variety
    • Veracity

    Taken as a whole all four of the 4Vs represent a living, breathing piece of data (or datum to be a little pedantic) which, once we get past the metaphorical phase, suggests that the data actually has impact. People are interested in it. It has relative importance and therefore it has some degree of existential truth (which is where the Veracity component comes in). 

    Lacking that (which is what happens in my closed-world example above) holes develop in the capacity of an algorithm to truly understand what is happening. Its assessment of the situation may show that it is a case where trustworthiness may be questionable but beyond that it cannot really suggest anything. 

    The weakness here is in the conjecture. While humans can very quickly draw from their understanding of society and its structures and the possible pitfalls and suggest a motive in the overt absence of evidence of trustworthiness, an algorithm can only present the next ‘best’ answer it has available and that usually is never good enough. 

    How Does Google Do Map Semantic Connections?

    Google used to use Google+ and the web at large to track individual posts, link them to websites and personal profiles, map sentiment in comments and compare it all with past profile activity and textbook ‘signature’ styles to see what is real, what is not and what is somewhere in between. It continues to do this across the wider web using machine learning technology to provide it with the only cost-effective means to do so. 

    Given the ability of computers to do everything faster and better and their capacity to never forget it is easy to imagine that there is an always-on, omniscient mega-machine keeping tabs on everything and everybody and assigning some kind of ever evolving numerical value to everything. Clearly, this is not the case. 

    The reason lies in both the amount of information that is released every moment on the web and the computational power required to keep tabs of it all. Even a company as big as Google requires some kind of shortcut to make sense of it all and those shortcuts lie in trusted entities. The problem is it takes a long time to develop trusted entities that are in the same class as say Wikipedia or the New York Times. With time this problem will be a little smaller though the amount of fresh data released on the web will only grow. 

    We Are The Final Link

    The final link in the very long chain of processes that make information be true or false on the web, is us. Ultimately our activities, shortcuts and transparency become key to maintaining veracity across the web and while we may not be quite to the point where everyone is accountable for their actions and feels responsible for what they post, by degrees we will get there. Particularly as the divide between online and offline is being continuously bridged, first by our mobile devices and now by the advent of virtual reality and augmented reality connections. 

    What Marketers and Businesses Need to Know

    There is good news in all this for both marketers and businesses. If you’ve already got a copy of SEO Help then you’re ahead of the game and are already reaping the benefits. If you haven’t however you need, at the very least to do the following: 

    • Create data-density to your online presence that at least matches your offline one.
    • Find an audience. That means that on the web you need to engage. Do not just broadcast.
    • Define your identity. If a guy selling cronuts can do it, anybody can.
    • Think like a publisher. In Google Semantic Search I explained how now, none of us have a choice. Just like opening up a shop forces you to become an expert on window displays, color psychology and lighting, operating on the web requires you to know what works in terms of text, pictures and video.
    • Be personable. If your ‘voice’ and identity do not come across then people are unlikely to want to engage with a blunt, corporate sounding machine.
    • Be real. Acknowledge faults and weaknesses and work to set them right. 

    These are minimum requirements and each takes a lot of effort to get right. But then again something that requires hardly any effort at all is unlikely to hold much value in the eyes of the beholder which means it will not really get you anywhere. 

     

  • Rage Against the Machine

    Barry Nuttall making his stand against the local council in Hull
  • Reading Between The Lines of Data

    Metadata and the sniper mind marketing
  • Understanding The Impact of Semantic Search

    The value of metadata in data analysis

© 2018 David Amerland. All rights reserved