Language is chock full of ambiguity, and it can turn up in surprising places. Many words are hard to tell apart without context: most Americans pronounce “ladder” and “latter” identically, for instance. Keyboard inputs on mobile devices have a similar problem, especially for IME keyboards. For example, the input patterns for “Yankees” and “takes” look very similar:
![]() |
Photo credit: Kurt Partridge |
But in this context -- the previous two words, “New York” -- “Yankees” is much more likely.
One key way computers use context is with language models. These are used for predictive keyboards, but also speech recognition, machine translation, spelling correction, query suggestions, and so on. Often those are specialized: word order for queries versus web pages can be very different. Either way, having an accurate language model with wide coverage drives the quality of all these applications.
Due to interactions between components, one thing that can be tricky when evaluating the quality of such complex systems is error attribution. Good engineering practice is to evaluate the quality of each module separately, including the language model. We believe that the field could benefit from a large, standard set with benchmarks for easy comparison and experiments with new modeling techniques.
To that end, we are releasing scripts that convert a set of public data into a language model consisting of over a billion words, with standardized training and test splits, described in an arXiv paper. Along with the scripts, we’re releasing the processed data in one convenient location, along with the training and test data. This will make it much easier for the research community to quickly reproduce results, and we hope will speed up progress on these tasks.
The benchmark scripts and data are freely available, and can be found here: http://www.statmt.org/lm-benchmark/
The field needs a new and better standard benchmark. Currently, researchers report from a set of their choice, and results are very hard to reproduce because of a lack of a standard in preprocessing. We hope that this will solve both those problems, and become the standard benchmark for language modeling experiments. As more researchers use the new benchmark, comparisons will be easier and more accurate, and progress will be faster.
For all the researchers out there, try out this model, run your experiments, and let us know how it goes -- or publish, and we’ll enjoy finding your results at conferences and in journals.
Related Post:
because
a
- Take a better selfie with Lily
- Calculating Ada The Countess of Computing
- Creating a templated Binary Search Tree Class in C
- Projecting without a projector sharing your smartphone content onto an arbitrary display
- Will a robot take your job
- Hacker Tricks from Insiders A Threat to ERP Systems
- Forget Turing the Lovelace Test Has a Better Shot at Spotting AI
- Apple is building a car
- A step closer to quantum computation with Quantum Error Correction
- Could you fly a fighter jet with your mind
- Mounting the home directory on a different drive on the Raspberry Pi
- How Google Translate squeezes deep learning onto a phone
- The Plan to Build a Massive Online Brain for All the World’s Robots
- A Beginner’s Guide to Deep Neural Networks
- How to Copy or Hide a File inside an Image
- The life of a software engineer
- A Farewell to Orkut
- A Project on Windows NT
- Building A Visual Planetary Time Machine
- 10 awesome internet hacks to make your life better
- Google Databoard A new way to explore industry research
- How to put a flash mp3 player in blogger post
- A year and a bit with Inbox Zero
- Map of Life A preview of how to evaluate species conservation with Google Earth Engine
0 comments:
Post a Comment