David Amerland

semantic search

  • Google allows you to use you finger as an input device 

    It’s no secret that the easier it is to use something the more we use it. The more we use it the more diversified its use becomes in our lives. This is a principle that Google researchers understand well which is why the addition of handwriting recognition in Android devices in 82 languages is such a big thing. 

    Digital keyboards are fiddly and when typing on a small screen the potential for error is actually pretty high (one reason I find the Swiftkey predictive keyboard to be so useful). Google now does away with some of that by allowing input to be handwritten (yeah, messy as that may sound). 

     

    Google Handwriting examples in different languages

     

    For handwriting to actually be more than a gimmick as an input method there are some very specific requirements: 

    • Language modelling that can scale (Google has released their N-gram models to the world for anyone to use in research and development)
    • Accurate Optical Character Recognition capabilities. Google has long had a good track record on that.
    • Massive computation capability (and here Google is still ahead of any competitor)

    Cool as all this may be (and it is pretty cool) you also need one additional element that is a make-or-break deal: utility. If you have to be near a wi-fi or use roaming data in order to access a tool the moment you need it, the chances are you won’t. 

    Google has solved this by making the capability device-native, so it can work on your phone or tablet but also cloud-capable (if you want to take advantage of the cloud and Google’s servers). 

    The end result is pretty neat, the speed recognition of the characters that converts your messy caricature finger-writing on your phone’s screen to neatly typed words and letters is customizable from the app settings and for me who finds it next to impossible to write the @ symbol, Google figured out what it was and created the email address I was sending it to, in next to no time. 

    Google Handwriting input 

    Google has used the algorithms from Google Correlate to apply it to handwriting models that use a statistical predictive analysis pattern based on end-user data to find what the letters being written are in a given language (and there are 82 of them!). 

    The Semantic Search Question

    We’re getting to the point where “semantic search” is no longer a thing and really we should be talking about, just search. To pull all this off Google used nothing less than its ability to link up developments in different areas, powered primarily by search-related algorithms. All of this revolves around personalization and the development of technology to simply do things we want, rather than us having to figure out what to do with it. 

    Device Usage is Increasing

    There are several important takeaways here. Device usage is increasing. With that we can also surmise that the data that Google acquires also increases. This means better understanding of intent in searches, better understanding of query use (just like Voice Search changes the way we input queries so will handwritten text have an impact), better understating of context related to search queries, better understanding of device use in relation to intent. ll of this “better” all round stuff really means just one thing: search, marketing and branding (which are increasingly overlapping in their effect) are now governed by the basics of human psychology rather than the capabilities afforded by yet another set of tools. 

    This should also point the way forward. If you want your website and your online business to appear in relation to searches you need to understand what drives your potential audience to engage with you in the first place and position your content to meet, exactly that. 

    It ain’t rocket science. 

    Sources

    Translation-inspired OCR
    Nearest neighbor search in Google Correlate
    Google Handwriting Input in 82 languages on your Android mobile device

     

  • It’s easier to explain how something works when it no longer does. The reason for this lies in an obvious fact. When everything works as it should we forget about the effects and tend to focus on the mechanics. Because the system in question delivers what it promises we take its function for granted. As a result the “what” is conveniently overlooked and we focus on the “how”. 

    Let it break down at any point however and suddenly we become acutely aware of what it is that it actually does. Email, which is terrific in the way it breaks up messages at the point of origin, transmits fragmented bits over the internet pipes and then reassembles the message at the point of the receiver is amazing until it stops. Then we suddenly realize just how huge a chunk of our business relies on emails getting through to us immediately. 

    It’s the same with cognitive computing and semantic technologies, terms that are increasingly interchangeable. When employed correctly cognitive computing (which employs Machine Learning) takes masses of raw data and turns it into usable information by assessing the importance of each piece in relation to all the other pieces around it and then weighs the importance of a cluster of connected data in relation to all the other, similar clusters found on the web. The result is that answers are produced that closely approximate what a person would be able to provide had he had access to all the world’s information and a brain the size of a planet. 

    Not As Easy As It Sounds

    What sounds easy to explain is hard to do. For a start the algorithms that do all this have an accepted fail rate that in the best case scenario is around the 5% globally. But the global accuracy picture does not take into account what happens when the data required to cross-check and cross-reference the veracity of the connections is not there. 

    To illustrate what I mean consider what happens when I turn up at a conference on Big Data and call myself a Data Scientist. Because I play to stereotypes and want to live up to expectations, I have the impressive name badge, the clipboard and the slightly odd professorial attire. To clinch the deal I have also a presentation running behind me and have paid 50 friends to turn up and tell everyone who I am. 

    In that environment I am a data point. My attire and presentation are my primary footprint and my 50, paid friends are my connections. Anyone entering that environment has no reason to suspect I am lying and no good reason to challenge me on what I am purporting to be. 

    But a Data Scientist is not a point of data that works in a vacuum. You would expect to at least find a business I am working with that independently verifies my expertise and title. A publication or two. A book maybe. At least one paper. Other publications, excerpts, comments, interviews and appearances that indicate that yes, I am who I say I am and I do what I say I do. 

    Should there be a doubting Thomas in the audience (and in this case he plays the role of a search engine bot) all he has to do is Google my name to find all the connections, reviews of my books, citations and mentions. 

    This is what cognitive computing does when it comes to information. Not only does a spider of some description check to see the complexity and veracity of the immediate web that the presence of interlinked data has created but it then checks to see its history across a much wider spectrum of information. 

    The 4Vs Rule

    Data has a life that is governed by the Big Data concepts of: 

    • Volume
    • Velocity
    • Variety
    • Veracity

    Taken as a whole all four of the 4Vs represent a living, breathing piece of data (or datum to be a little pedantic) which, once we get past the metaphorical phase, suggests that the data actually has impact. People are interested in it. It has relative importance and therefore it has some degree of existential truth (which is where the Veracity component comes in). 

    Lacking that (which is what happens in my closed-world example above) holes develop in the capacity of an algorithm to truly understand what is happening. Its assessment of the situation may show that it is a case where trustworthiness may be questionable but beyond that it cannot really suggest anything. 

    The weakness here is in the conjecture. While humans can very quickly draw from their understanding of society and its structures and the possible pitfalls and suggest a motive in the overt absence of evidence of trustworthiness, an algorithm can only present the next ‘best’ answer it has available and that usually is never good enough. 

    How Does Google Do Map Semantic Connections?

    Google used to use Google+ and the web at large to track individual posts, link them to websites and personal profiles, map sentiment in comments and compare it all with past profile activity and textbook ‘signature’ styles to see what is real, what is not and what is somewhere in between. It continues to do this across the wider web using machine learning technology to provide it with the only cost-effective means to do so. 

    Given the ability of computers to do everything faster and better and their capacity to never forget it is easy to imagine that there is an always-on, omniscient mega-machine keeping tabs on everything and everybody and assigning some kind of ever evolving numerical value to everything. Clearly, this is not the case. 

    The reason lies in both the amount of information that is released every moment on the web and the computational power required to keep tabs of it all. Even a company as big as Google requires some kind of shortcut to make sense of it all and those shortcuts lie in trusted entities. The problem is it takes a long time to develop trusted entities that are in the same class as say Wikipedia or the New York Times. With time this problem will be a little smaller though the amount of fresh data released on the web will only grow. 

    We Are The Final Link

    The final link in the very long chain of processes that make information be true or false on the web, is us. Ultimately our activities, shortcuts and transparency become key to maintaining veracity across the web and while we may not be quite to the point where everyone is accountable for their actions and feels responsible for what they post, by degrees we will get there. Particularly as the divide between online and offline is being continuously bridged, first by our mobile devices and now by the advent of virtual reality and augmented reality connections. 

    What Marketers and Businesses Need to Know

    There is good news in all this for both marketers and businesses. If you’ve already got a copy of SEO Help then you’re ahead of the game and are already reaping the benefits. If you haven’t however you need, at the very least to do the following: 

    • Create data-density to your online presence that at least matches your offline one.
    • Find an audience. That means that on the web you need to engage. Do not just broadcast.
    • Define your identity. If a guy selling cronuts can do it, anybody can.
    • Think like a publisher. In Google Semantic Search I explained how now, none of us have a choice. Just like opening up a shop forces you to become an expert on window displays, color psychology and lighting, operating on the web requires you to know what works in terms of text, pictures and video.
    • Be personable. If your ‘voice’ and identity do not come across then people are unlikely to want to engage with a blunt, corporate sounding machine.
    • Be real. Acknowledge faults and weaknesses and work to set them right. 

    These are minimum requirements and each takes a lot of effort to get right. But then again something that requires hardly any effort at all is unlikely to hold much value in the eyes of the beholder which means it will not really get you anywhere. 

     

  • SEO Help: 20 Practical Steps to power your content creation, marketing and branding in the new AI world of Google search.

    SEO Help for the semantic search and artificial intelligence age

    The digital space is now pervasive, noisy and fragmented. Being noticed by those who should be your customers has become really hard.

    Search is a constantly changing activity. Yet its primary focus is always the same: How to serve content to online searchers that is accurate and relevant. That is also the ‘secret’ to success in search. If the content you have created can be indexed well enough to accurately gauge its relevance then its chances to be seen by those who are looking for it are really good. If the content itself is also considered to somehow be “better” than similar content, then its chances of being seen rise exponentially.

    Great content that is accurate and relevant enjoys a higher visibility in search. Just how to make sure that the content you create is considered “better” and it is both accurate and relevant in terms of your branding is exactly what this book is about.

    The very first edition of “SEO Help” came out in 2010. Since that time, in three updated iterations “SEO Help” has won Book Authority’s “Best SEO Book of All Time” award and it has become a search engine optimization classic that’s helped countless webmasters understand what they need to do in order to increase the online success of their business.

    This edition is no different. While it is true that search and search behavior are changing rapidly in response to technological innovation the fundamentals that drive an online surfer to use a search engine to look for something haven’t changed. While it is also true that search technology has changed drastically since 2010 the fundamentals that make a search engine deliver a particular result in response to a search query have also remained steadfast.

    SEO Help: 20 Practical Steps to power your content creation, marketing and branding in the new AI world of Google Search is a detailed guide to those fundamentals. It tells you what to do, when and how in order to make sure that every item of content you create, whether that is text, video, podcast or graphic; works in your favor.

    In this edition you will also learn:

    • How the increasing use of artificial intelligence (AI) and machine learning affects search, marketing and branding (and how to take advantage of it all).
    • What the fragmentation of search means to your brand and your business and how to make the most of what you currently do.
    • What to do to make your brand stand out from the crowd without increasing the output of your content creation efforts.
    • What to do to increase trust in your brand and the content you create in a time of negative news stories and fake news.
    • How to better use Google’s Knowledge Graph (KG) to increase the trustworthiness of your digital presence.
    • Why marketing and branding cannot be separated from search and your business’ SEO practices.
    • How to leverage the fragmented social media landscape to your advantage.
    • How to future-proof your business against constant changes in search.
    • The true impact of Google’s mobile index on your digital business.
    • What feasible shortcuts exist in search marketing and branding.

    Like before, each chapter is thin on theory and heavy on practical steps you need to take. Like before, each chapter ends with a full practical-steps guide you should be implementing to make sure your business stays viable.

    The title is out in January 2020 but I have 500 copies available at 30% discount off its $19.99 price, for those who pre-purchase now, available on a first-come, first-served basis.

    That's just $13.99 with FREE shipping:  

     
    PayPal Acceptance Mark

  • In a world where everything is data, navigating to the right place, finding the right answer or matching the right pair (of anything) is always a search problem. Data only makes sense when it is networked, connected, indexed, analyzed, assessed, abstracted, categorized, organized and presented in relation to other data.

    The process is an endless rinse-and-repeat cycle where the metadata surfaced becomes semantically dense enough to become data in its own right, allowing further metadata to be extracted from it. 

    Let’s get practical. Apply all the theoretical abstraction I’ve written above to the usual “Morning!” Greeting between neighbors. The depth of the relational connection between them (are they good friends, or are they being civil to each other?) will reveal itself in the warmth of the candour of that one, single word, exchanged. Is one distracted, lost in thought? Depressed? Angry? Clipped tones, trailing endings, a pitch that’s so low as to be barely audible or too high and sounds like a whine can be used to analyze emotions. Is the sound harsh? The word spoken fast, like an expletive almost, or are the syllables, long-drawn out? The difference could spell out whether there is enmity in the relationship, hidden aggression or it’s a casual, social connection with no other overtones. 

    We’ve only used one word and that’s before we begin to analyze whether there is a male/female interaction involved or whether a regional or national accent comes into play. 

    This is exactly the kind of semantic analysis Google does with speech in order to help improve its understanding of spoken queries in search. Because speech is data, possessing it also allows the accumulation of knowledge which stems from a sense of how speech is broken down into discrete units, analyzed for content, context and importance and classified. This allows Google the ability to reverse-engineer the process and create human-like speech using a computer that can now use inflexion, pitch, rhythm and speed to denote warmth, friendliness and openness. 

    There are several important takeaways here: 

    • In a data-centric world search is everywhere, even if we do not actively call it search or have a sense of it as such.
    • Everything that has an effect is information. Information is data. Data is subject to analysis and classification. That includes relatively ethereal things like emotion and intent.
    • Once metadata accumulates it becomes substantial enough to be subject to further analysis and classification so it becomes data which gives rise to further metadata.
    • The process of labelling, classification and refinement can be continued ad infinitum unless there are clear boundaries marked by benefits vs costs which do not fully justify the reiteration.
    • Data always has value. Its value is always contextual. 

    As Google’s machine learning gets better and better its voice recognition and voice synthesis capabilities will exponentially improve. Machine learning is closely linked to exponential growth because of the way training sets of data are sampled and the algorithms are then recalibrated. Exponential growth, as the graph below illustrates, has a latency period after which change accelerates dramatically. In practical terms this means that once machine learning gets past a tipping point it begins to produce good results at an accelerated rate.     

    Exponential Growth in Machine Learning Accuracy

    Getting to the Very Core of Reality

    Marketing has never quite been about being real. It has always been seen as the means through which a stimulus is created which is then satisfied by the product or service that is being marketed. But that is, to put it mildly, manipulation. It plays on desires, needs and fears to create a false sense of urgency that will lead to a purchase before the potential buyer has had the chance to research anything, think things through or change her mind. 

    Semantic search promised to change all of this by creating entities which are based on identity. This generates data, that needs to be classifed and validated.

    Machine learning makes all of this faster and less costly which means that more and more can be done without increasing operating costs. 

    Fire hydrant voice search querySearch queries posed in natural language can be processed and matched against real world concepts and objects without going through the traditional ‘translation’ phase where we try to think what specific search terms might possibly describe those objects. The search query “Red cylindrical object used to fight fire” returns, without any hesitation, “fire hydrant” on Voice Search.

    One of the most specific areas where this takes place is voice search and voice interaction. Without a keyboard to input a search query we have no drop-down autosuggestions from Google. We also cannot always remember what we searched for two queries earlier so the very concept of search terms (or even keywords) becomes redundant. 

    The approach has two very significant effects: 

    • Natural language description frequently supplants exact search terms and, even a search methodology.
    • It often does not feel like search. (Google Now, Waze, Google Maps, YouTube, GMail and Google Photos) are examples where search technology is active in the background. 

    The video below on Google Voice and how it is put together beautifully explains some of the concepts:  

    What it really means is that everything a business, a brand or a person does online and offline now really matters. This concept of “data density” was first broached in SEO Help designed, very specifically to address issues of identity, brand values and entity formation as part of a business’ or a brand’s day-to-day activities. 

     Because everything is data and everything is beginning to be understood and indexed, creating the necessary semantically rich data density required to succeed in search has to be part of an incremental, sustained and sustainable process that weds brand identity and core values with brand marketing activities and brand voice. Of course, in a semantic web, from a presence point of view everyone and everything is, from a practical point of view, a brand.  

     

© 2019 David Amerland. All rights reserved