Semantic search will index video in a contextual manner

There is a direct link between developments in search and content creation formats. When content was primarily text, search did a great job indexing it. When pictures came to play more and more of a role, search again, developed to understand visual images better

We are now in the age of the video. From discussions ranging over format (vertical or horizontal) to questions over what kind of content works best in video format, more and more moving images become part of what we use to communicate, engage with and find the audience we hope to attract. 

Google researchers, rightly said, that video is a series of still frames linked by time. The ‘problem’ with video is that it is computationally intensive to process and understand and because it adds the component of time it represents a very fluid parameter that does not easily scale across the board (i.e. is a short video ‘better’ than a long one, how can computing power be allocated when every video has a different length? How can still frames be analyzed well enough to make predictions over time?). 

We don’t normally think about it ourselves too much but when we actually watch a video, the sequence of still images presented to us are stitched into motion by our brains in a way that also predicts the correctness of what is happening so that, for instance, a tennis ball hit over the net in a flat trajectory does not change color from yellow to red and does not suddenly turn into a vertical lob. Either of these, experienced, would tell us that the video we are watching is a fabrication, rather than a depiction of something real. 

This requires a few things from us: 

  • Knowledge about the world
  • A sense of time
  • Awareness of context

These three things are what separate our adult brains from those of babies and young children. They also, usually separate people looking at video from computers.

In a ground-breaking paper to be presented at the 2015 Computer Vision and Pattern Recognition conference in Boston Google researchers believe they have been able to crack the problem both from an object recognition, computer-vision point of view and a computational power requirement. 

The computer equivalent of the three things mentioned above comes through: 

  1. The use of convolutional neural networks (CNN) to sample the video frames and draw from them inferences on what the video being analyzed is about.
  2. The parameters that are computed per video are then shared through time across the algorithms being used to analyze it. This successfully creates a sense of time for a computer that is tailored to each video and does not create any confusion.
  3. Sampling video frames and using the sample to accurately predict expected motion, called optical flow, in them (the context element that governs what happens in each situation we view, according to our expectations, as humans). 

You can see some of the 30-second snippets Google’s smart computers sampled and the classification they gave them, in the short video below: 

https://www.youtube.com/watch?v=oDRl3-X1KkI

What This Means For Semantic Search

There are a few things here that are key here. First, entities create indexing shortcuts for Google. The importance of entities in semantic search cannot be stressed enough. Second, video will soon be part of what is considered to rank websites, alongside all other content. Third, video, done properly becomes a very easy way to determine Authority and Expertise in relation to the content being published (for instance, the video above, properly understood becomes a strong corroborative element for the expertise of this web page on semantic search and video indexing). 

Something not completely relevant but totally related is that none of this would have been possible had not the neural networks used been able to work out the relational interconnections between the sampled elements of each video frame. To understand this better consider that instead of analyzing an entire video frame-by-frame the neural networks used, sample a frame, analyze the entities within it, understand the context, within the frame, and then link their understanding to the next frame and the next. The evolving values of each entity become a 3D representational framework of the entire video. 

Before you begin to think “that’s a lossy way of doing it” consider that: 

…anatomical and physiological studies in monkeys suggest that visual signals are fed into at least three separate processing systems. One system appears to process information mainly about shape; a second, mainly about color; and a third, movement, location, and spatial organization.

Source: Brainfacts.org

How Does This Affect Your Marketing?

Content is key. Video has usually been used as an adornment. A glorified visual break up of solid text. It will now become possible to have web pages that contain a single video and a small description and have them rank for their content in search queries whose context they satisfy. 

In Google Semantic Search I mentioned how a business owner now and those working for brands have to think of themselves as writers, photographers, videographers and publishers. With each passing month this is becoming more and more of a reality. You can, at this stage, feel that the mountain of work you have to do as part of your job only got bigger, or you can smile and think that here are more opportunities to integrate content into your daily workflow and make it work for you in search.

As before (or as always) think about presenting something real to humans first, rather than search. Consider what fires you up and how you might make it work by projecting it across the work. Consider how it fits in with your brand values and message (get SEO Help to help you in all this, if you haven’t already). 

Remember search and marketing are a fluid, ever shifting environment. The only constants are your passion, drive, identity and values. Your only real advantage the ingenuity you can bring to bear upon all this. 

Sources

Beyond Short Snippets: Deep Networks for Video Classification (Google Research)
Beyond Short Snippets: Deep Networks for Video Classification (pdf)
How Google cracked the image translation problem
Visual searching and how it affects your marketing in a semantic web
Semantic image search: how computers learn to see

 

 

TPL_BACKTOTOP