Today we announced that the Google Translate app now does real-time visual translation of 20 more languages. So the next time you’re in Prague and can’t read a menu, we’ve got your back. But how are we able to recognize these new languages?
In short: deep neural nets. When the Word Lens team joined Google, we were excited for the opportunity to work with some of the leading researchers in deep learning. Neural nets have gotten a lot of attention in the last few years because they’ve set all kinds of records in image recognition. Five years ago, if you gave a computer an image of a cat or a dog, it had trouble telling which was which. Thanks to convolutional neural networks, not only can computers tell the difference between cats and dogs, they can even recognize different breeds of dogs. Yes, they’re good for more than just trippy art—if youre translating a foreign menu or sign with the latest version of Googles Translate app, youre now using a deep neural net. And the amazing part is it can all work on your phone, without an Internet connection. Here’s how.
Step by step
First, when a camera image comes in, the Google Translate app has to find the letters in the picture. It needs to weed out background objects like trees or cars, and pick up on the words we want translated. It looks at blobs of pixels that have similar color to each other that are also near other similar blobs of pixels. Those are possibly letters, and if they’re near each other, that makes a continuous line we should read.

But interestingly, if we train just on very “clean”-looking letters, we risk not understanding what real-life letters look like. Letters out in the real world are marred by reflections, dirt, smudges, and all kinds of weirdness. So we built our letter generator to create all kinds of fake “dirt” to convincingly mimic the noisiness of the real world—fake reflections, fake smudges, fake weirdness all around.
Why not just train on real-life photos of letters? Well, it’s tough to find enough examples in all the languages we need, and it’s harder to maintain the fine control over what examples we use when we’re aiming to train a really efficient, compact neural network. So it’s more effective to simulate the dirt.
![]() |
Some of the “dirty” letters we use for training. Dirt, highlights, and rotation, but not too much because we don’t want to confuse our neural net. |
Finally, we render the translation on top of the original words in the same style as the original. We can do this because we’ve already found and read the letters in the image, so we know exactly where they are. We can look at the colors surrounding the letters and use that to erase the original letters. And then we can draw the translation on top using the original foreground color.
Crunching it down for mobile
Now, if we could do this visual translation in our data centers, it wouldn’t be too hard. But a lot of our users, especially those getting online for the very first time, have slow or intermittent network connections and smartphones starved for computing power. These low-end phones can be about 50 times slower than a good laptop—and a good laptop is already much slower than the data centers that typically run our image recognition systems. So how do we get visual translation on these phones, with no connection to the cloud, translating in real-time as the camera moves around?
We needed to develop a very small neural net, and put severe limits on how much we tried to teach it—in essence, put an upper bound on the density of information it handles. The challenge here was in creating the most effective training data. Since we’re generating our own training data, we put a lot of effort into including just the right data and nothing more. For instance, we want to be able to recognize a letter with a small amount of rotation, but not too much. If we overdo the rotation, the neural network will use too much of its information density on unimportant things. So we put effort into making tools that would give us a fast iteration time and good visualizations. Inside of a few minutes, we can change the algorithms for generating training data, generate it, retrain, and visualize. From there we can look at what kind of letters are failing and why. At one point, we were warping our training data too much, and ‘$’ started to be recognized as ‘S’. We were able to quickly identify that and adjust the warping parameters to fix the problem. It was like trying to paint a picture of letters that you’d see in real life with all their imperfections painted just perfectly.
To achieve real-time, we also heavily optimized and hand-tuned the math operations. That meant using the mobile processor’s SIMD instructions and tuning things like matrix multiplies to fit processing into all levels of cache memory.
In the end, we were able to get our networks to give us significantly better results while running about as fast as our old system—great for translating what you see around you on the fly. Sometimes new technology can seem very abstract, and its not always obvious what the applications for things like convolutional neural nets could be. We think breaking down language barriers is one great use.
Related Post:
a
- Take a better selfie with Lily
- Calculating Ada The Countess of Computing
- Creating a templated Binary Search Tree Class in C
- Projecting without a projector sharing your smartphone content onto an arbitrary display
- Will a robot take your job
- Hacker Tricks from Insiders A Threat to ERP Systems
- Forget Turing the Lovelace Test Has a Better Shot at Spotting AI
- A Billion Words Because todays language modeling standard should be higher
- Apple is building a car
- A step closer to quantum computation with Quantum Error Correction
- Could you fly a fighter jet with your mind
- Mounting the home directory on a different drive on the Raspberry Pi
- The Plan to Build a Massive Online Brain for All the World’s Robots
- A Beginner’s Guide to Deep Neural Networks
- How to Copy or Hide a File inside an Image
- The life of a software engineer
- A Farewell to Orkut
- A Project on Windows NT
- Building A Visual Planetary Time Machine
- 10 awesome internet hacks to make your life better
- Google Databoard A new way to explore industry research
- How to put a flash mp3 player in blogger post
- A year and a bit with Inbox Zero
- Map of Life A preview of how to evaluate species conservation with Google Earth Engine
computer
- Take a better selfie with Lily
- Free Lecture The Psychology of Computer Insecurity
- MOOC Research and Innovation
- Calculating Ada The Countess of Computing
- When can Quantum Annealing win
- Creating a templated Binary Search Tree Class in C
- Projecting without a projector sharing your smartphone content onto an arbitrary display
- Will a robot take your job
- Facebook Introduces ‘Hack ’ the programming language of the future
- High Resolution Scary Haunted House Wallpapers for Desktop
- TYBSC IT Sem V Question Papers 2009 Mumbai University
- Home automation update
- Very easy to download youtube videos audio mp3 format
- HD Dark Desktop Background Wallpapers Download
- Launching the Quantum Artificial Intelligence Lab
- Syrias children learn to code with the Raspberry Pi
- Running omxplayer from the command line easily using alias
- Largest collection of Google Logos on the web Set 7
- Collection of SQL queries with Answer and Output Set 2
- Prevent access to specific partition or drive
- Summer Games Learn to Program
- PiAUISuite Update and Voicecommand v3 1
- Sign in to edx org with Google and Facebook and
- Large Scale Machine Learning for Drug Discovery
- Hacker Tricks from Insiders A Threat to ERP Systems
deep
- Classifying everything using your RPi Camera Deep Learning with the Pi
- Beyond Short Snippets Deep Networks for Video Classification
- A Beginner’s Guide to Deep Neural Networks
- From Pixels to Actions Human level control through Deep Reinforcement Learning
- Teach Yourself Deep Learning with TensorFlow and Udacity
0 comments:
Post a Comment