Tech Briefs

System could detect early signs of neurological disease or mental illness.

Eye-tracking technology, which determines where in a visual scene people are directing their gaze, has been widely used in certain areas of medical and scientific research, but cost issues have kept it from finding consumer applications. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory, together with the University of Georgia, hope to change that with software that can turn any smartphone into an eye-tracking device. In addition to making existing applications of eye-tracking technology more accessible, the system could enable new computer interfaces, or help detect signs of incipient neurological disease or mental illness.

The eye-tracking system uses an ordinary cellphone camera. (Credit: MIT)

The research team included Aditya Khosla, an MIT graduate student in electrical engineering and computer science; Kyle Krafka of the University of Georgia; and MIT professors of electrical engineering and computer science Wojciech Matusik and Antonio Torralba. They set out to make an eye tracker that works on a single mobile device using just the front-facing camera. The eye tracker was built using machine learning, a technique in which computers learn to perform tasks by looking for patterns in large sets of training examples.

Khosla and his colleagues’ advantage over previous research was the amount of data they had to work with. Currently, Khosla says, their training set includes examples of gaze patterns from 1,500 mobile-device users. Previously, the largest data sets used to train experimental eye-tracking systems had topped out at about 50 users. To assemble data sets, “most other groups tend to call people into the lab,” Khosla says. “It’s really hard to scale that up. Calling 50 people in itself is already a fairly tedious process. But we realized we could do this through crowdsourcing.”

An initial round of experiments, using training data drawn from 800 mobile-device users, was able to get the system’s margin of error down to 1.5 centimeters. That’s a twofold improvement over previous experimental systems. Since completion of a research paper, however, they’ve acquired data on another 700 people, and the additional training data has reduced the margin of error to about a centimeter.

To get a sense of how larger training sets might improve performance, the researchers trained and retrained their system using different-sized subsets of their data. Those experiments suggest that about 10,000 training examples should be enough to lower the margin of error to a half-centimeter, which Khosla estimates will be good enough to make the system commercially viable.

To collect their training examples, the researchers developed a simple application for devices that use Apple’s iOS operating system. The application flashes a small dot somewhere on the device’s screen, attracting the user’s attention. It then briefly replaces the dot with either an “R” or an “L,” instructing the user to tap either the right or left side of the screen. Correctly executing the tap ensures that the user has actually shifted his or her gaze to the intended location. During this process, the device camera continuously captures images of the user’s face.

The researchers’ machine-learning system was a neural network, which is a software abstraction, but can be thought of as a huge network of very simple information processors arranged into discrete layers. Training modifies the settings of the individual processors so that a data item (in this case, a still image of a mobile-device user) that’s fed to the bottom layer will be processed by the subsequent layers. The output of the top layer will be the solution to a computational problem, which for this research was an estimate of the direction of the user’s gaze.

Neural networks are large, however, so the MIT and Georgia researchers used a technique called “dark knowledge” to shrink their network. Dark knowledge involves taking the outputs of a fully trained network, which are generally approximate solutions, and using those as well as the real solutions to train a much smaller network. The technique reduced the size of the researchers’ network by roughly 80 percent, enabling it to run much more efficiently on a smartphone. With the reduced network, the eye tracker can operate at about 15 frames per second, which is fast enough to record even brief glances.

For more information, visit http://news.mit.edu.