About the above: About the above: How Facebook AI engineers using computer vision to help machines turn bits and bytes of image data into a full understanding of its texture and context. High-level semantics, like smiling people. Where images fit in the world around it, and what they think needs to happen to improve it.
“From the biological science point of view, computer vision aims to come up with computational models of the human visual system. From the engineering point of view, computer vision aims to build autonomous systems which could perform some of the tasks which the human visual system can perform (and even surpass it in many cases).”
That definition, by Dr. Thomas S. Huang, Professor Emeritus at the University of Illinois-Champaign reflects the history of this approach to artificial intelligence. Computer vision has its roots in signal processing, “the analysis, synthesis and modification of signals”. As the technology evolved, so did our ability to read different kinds of signals. In the 1960s, Woodrow Wilson Bledsoe became created the RAND tablet, a computer imaging device that recorded the coordinates of facial features based on hand-scanning a photograph—now known as facial recognition.
Generally, computer vision enables ML and NLP to process different types of data. Computer vision and ML is needed to remind the machine that it has previously seen a face. Add NLP, and you have the ability to read, understand and analyze legal contracts. It is also the backbone of all three extended reality technologies—virtual, augmented and mixed realities.
In the late 00s/early 10s, computer vision was the technology that helped drive child porn onto the dark web. As the number of images proliferated on the web, internet companies complained there was nothing they could do, almost approaching the problem as a cost of doing business. A number of researchers and scientists resisted the lazy thinking of internet companies and developed technology to identify skin, analyze motion and metadata as real time tools companies could use to instantly review images. They gave it to the web companies. Soon after the problem stopped, amazingly. Today, social media companies complain similarly about hate speech and other problems. They should learn more from the solutions, not the excuses, of the past. Otherwise it is reasonable to think it will be legislated upon them.
Fast-forward to today. It is difficult to envision the depth of experience humans and machines have built up as a foundation for computer vision. An untold number of pictures collected since Man decided to draw pictures of what cavemen painted on cave walls; to the billions of images we upload every day with smartphones.
In terms of raw data to teach machines to see, we have built an impressive archive. Given that, the brain of a six-month old has magnitudes more power to see, identify and understand. Your child will never forget your face. Your high-end smartphone did it yesterday.
Three most common tools of computer vision are recognition, motion analysis, scene reconstruction and image restoration. Let’s look at each:
Recognition—The field of recognition helps machines determine if an image, document, or frame of live action contains a specific object, feature or demonstrates a unique activity. It covers a wide range of applications. Here are some of the most common:
Object recognition allows machines to understand objects, based on shape, color, pattern, etc. With that information, computers can classify and extract features from what they see.
Optical character, word and intelligent word recognition allows machines to recognize printed words on a document, handwritten words on a character-by-character basis, or the ability to read large passages of handwritten material as a whole.
Optical mark recognition allows machines to learn and understand specific human markings on a document.
Amazon Go is the newest and best example of what recognition can do. They are a collision of recognition, motion analysis, sensor data and machine learning. Plus snacks, drinks and phone cables.
Each Go outlet is a small retail store where you walk in, fire up your app, shop and leave. Notice I didn’t say stand in a checkout line and pay. A web of sensors and cameras track each customer, records where they go, scans what they put in their bag, and charges their preferred Amazon payment method just before they walk out the door. Check out the video:
Motion analysis—this technology is about teaching machines to estimate motion in an image. It is where a machine image sequence is processed to produce information based on the apparent motion in the picture. Motion detection, tracking the motion of a specific object or objects over time. Self-driving cars are completely dependent on motion analysis and recognition.
AirMouse—this revolutionary product, from AI startup Twentybn, is a touchless pointing device that uses computer vision to track finger movement in the air. Behold:
Scene reconstruction—this application of computer vision teaches machines to capture the shape and appearance of real objects, usually in 3D. Scene reconstruction is a major technology that enables virtual and mixed reality.
Image restoration—image restoration is as simple as the name. It’s teaching machines how to restore corrupt or degraded photos or images closer to their original state or specs. The value here is in de-noising images. Like X-rays, or pictures of bad guys. Last year, NVIDIA engineers figured out how to use grainy training images to teach a machine how to remove image grain. Here’s the result:
NVIDIA is also working on using AI to restore the original color into black and white images and video.
Computer vision allows a machine to “see” shape, detail, motion, recreate things previously viewed and restore images to a prior state. How does that fit within your agency’s client base? How does that fit within your department or company?
Look at your product, service, agency or clientele. Look at the touchpoints your customers use to move through the various processes of interacting with you. Can automating the process to see and understand details make things easier and more efficient? What happens if a customer could upload a picture of a problem or situation? Internally, can computer vision help streamline, economize or improve the quality of what you make? The key to using computer vision to focus on a segment of your customer base is looking at your business through new lenses, so to speak.
Cost leadership—according to Porter, a cost leadership strategy may include “the pursuit of economies of scale, proprietary technology and other factors in order to take advantage.” Is there something you do on behalf of your clients that gives you an opportunity to create a low-cost, stand-alone tool? Is there an opportunity within your company to exploit?
Differentiation—Porter defines differentiation as seeking advantage through being unique in its industry along some dimensions that are widely valued by buyers. If you combine computer vision with a neural net, you can train a machine to consider context and sentiment into image or video analysis. How could that improve focus groups, field testing, customer personas as well as content creation? What happens if you use that to turbo-charge personalization? How could that make you stand out in your industry?
Computer vision is a tool. Just like using a hammer, computer vision will help in some situations and have little value in others. Review your data, listen to your people, talk to customers—both happy and angry. They will likely lead you to a place where you see an opportunity.
Think about where an unblinking eye, or the ability to see, identify, read, understand, remember, confirm and verify would be useful.
Think about where time-lapse and analysis would be useful. Monitoring. Patterns and trends, comings and goings.
Think about where immersion could add value. Would it be useful to mount a 360-degree camera in an area to give viewers a better sense of what’s happening? How high up should it be?
Turn the eye inward. Are there internal processes where computer vision could aid with quality control? Porter’s strategy isn’t external only. Can a computer vision project align with business strategy?
Privacy is a top concern of everyone. Just because a computer sees you doesn’t mean it should be allowed to recognize you. Even though we are approaching a level of untold personalization, how communicators use it will determine whether your customers accept or reject it.
20 Billion Neurons (20BN) is a German company who build Millie, a life-sized and context-aware digital companion. Millie is used in application where being able to see and understand expression and gestures is key—way finder, store greeter or brand promoter. Take a look at Millie. Event agencies, do you have clients who regularly produce large events where Millie could help?
March 4, 2016
Artificial Intelligence, Computer Vision