- Blog
- Deep Learning
Deep Learning
When Ray Kurzweil met with Google CEO Larry Page in July, he wasn’t seeking employment. As an accomplished inventor and a leading voice in machine intelligence, Kurzweil wanted to discuss his upcoming book, How to Create a Mind. Page, who had already read an early draft, listened as Kurzweil explained his vision for developing a truly intelligent computer, one that could understand language, make inferences, and make independent decisions.
It quickly became clear that achieving this goal would require access to vast amounts of data and powerful computing resources – something only a company like Google could provide. Page acknowledged this, telling Kurzweil that while he could potentially grant some access, it would be difficult to do so for an independent company. Instead, Page proposed that Kurzweil join Google. Having always run his own companies, Kurzweil had never taken a job elsewhere, but the opportunity was too significant to pass up. In January, he officially joined Google as a director of engineering. He described the role as the fulfillment of five decades of work focused on artificial intelligence.
Kurzweil was drawn not only to Google's vast computing power but also to the company’s advancements in deep learning, a branch of AI that has made remarkable progress. Deep-learning software is designed to replicate the activity of neurons in the neocortex, the part of the brain responsible for higher-level thinking. The software is capable of recognizing patterns in digital data, including sounds and images, allowing it to learn in a way that closely resembles human cognition.
The concept of using software to replicate the neocortex’s network of neurons in an artificial neural system has existed for decades. While this has led to both breakthroughs and setbacks, recent advances in mathematical algorithms and more powerful computing have enabled researchers to model deeper layers of virtual neurons than ever before.
With these advancements, significant progress has been made in speech and image recognition. In June, a Google deep-learning system analyzed 10 million images from YouTube videos and achieved nearly twice the accuracy of previous efforts in identifying objects like cats. Google also applied this technology to improve speech recognition in its latest Android mobile software, reducing errors.
In October, Microsoft’s chief research officer, Rick Rashid, demonstrated speech software in China that transcribed his spoken words into English text with a 7 percent error rate. The system then translated the text into Chinese and generated audio in Mandarin using a synthesized version of his voice. That same month, a research team consisting of three graduate students and two professors won a Merck-sponsored competition to identify molecules with potential for drug development. The team used deep learning to pinpoint molecules most likely to interact effectively with their targets.
Google has become a major hub for deep learning and artificial intelligence expertise. In March, the company acquired a startup co-founded by Geoffrey Hinton, a computer science professor at the University of Toronto and a member of the team that won the Merck competition. Hinton, who will divide his time between the university and Google, aims to apply ideas from deep learning to practical challenges, including image recognition, search, and natural language processing.
The progress in AI has given researchers new optimism about the potential for intelligent machines to become more than just a concept in science fiction. Machine intelligence is already making a significant impact on industries such as communications, computing, medicine, manufacturing, and transportation. IBM’s Watson, known for winning Jeopardy!, incorporates deep-learning techniques and is now being developed to assist doctors in making medical decisions. Microsoft has also integrated deep learning into its Windows Phone and Bing voice search.
Expanding deep learning beyond speech and image recognition will require further advancements in software, concepts, and processing power. Fully autonomous machines capable of independent thought are still far from reality and may remain a distant goal for years or even decades. However, Peter Lee, head of Microsoft Research USA, notes that deep learning has revived interest in tackling some of artificial intelligence’s biggest challenges.
Building a Brain
Various approaches have been used to tackle these challenges. One method involved programming computers with detailed information and predefined rules about the world. This required developers to manually create software that recognized features such as edges in images or specific sounds. The process was time-consuming, and the resulting systems struggled with ambiguous data. They were limited to highly controlled tasks, such as automated phone menus that required users to speak predefined commands.
Neural networks, introduced in the 1950s shortly after artificial intelligence research began, showed promise by attempting to mimic brain function in a simplified way. These systems use a network of virtual neurons, each assigned random numerical values or weights, to define connections between them. These weights influence how each neuron responds mathematically, on a scale from 0 to 1, to digital features. For example, in an image, a neuron might detect an edge or a specific shade of blue, while in speech recognition, it might identify a particular energy level at a given frequency within a phoneme, the fundamental unit of spoken language.
To train a neural network to recognize an object or a phoneme, programmers exposed it to digitized images or sound waves containing those elements. When the network failed to accurately identify a pattern, an algorithm adjusted the weights assigned to connections between its virtual neurons. The goal was to enable the network to consistently recognize patterns in speech or images, such as the phoneme “d” or the image of a dog. This process mirrors how a child learns to recognize a dog by observing characteristics like head shape, behavior, and sounds, associating them with what others call a dog.
Early neural networks, however, could simulate only a small number of neurons at a time, limiting their ability to recognize complex patterns. As a result, progress stalled throughout the 1970s.
Interest in neural networks was revived in the mid-1980s when Geoffrey Hinton and others introduced “deep” models that utilized multiple layers of virtual neurons more effectively. However, this approach still required extensive human input, as programmers had to manually label data before feeding it into the network. Additionally, recognizing complex speech and images demanded far more computational power than was available at the time.
In the past decade, Geoffrey Hinton and other researchers achieved key conceptual breakthroughs in deep learning. In 2006, Hinton developed a more efficient method for training individual layers of neurons. The first layer identifies basic features, such as an edge in an image or the smallest unit of speech sound. It accomplishes this by detecting patterns in digitized pixels or sound waves that appear more frequently than expected by chance. Once this layer accurately identifies these basic features, the information is passed to the next layer, which learns to recognize more complex patterns, such as corners or speech sound combinations. This process continues through multiple layers until the system reliably identifies phonemes or objects.
One notable example involved cats. In June, Google demonstrated one of the largest neural networks to date, containing over a billion connections. A team led by Stanford computer science professor Andrew Ng and Google Fellow Jeff Dean trained the system using images from 10 million randomly selected YouTube videos. One simulated neuron focused on recognizing cats, while others identified human faces, yellow flowers, and other objects. Remarkably, the system categorized these items without any prior human input or labeling, highlighting the potential of deep learning.
The most surprising aspect for AI researchers was the significant improvement in image recognition. The system correctly classified objects and themes from YouTube images 16 percent of the time. While that number may seem low, it represented a 70 percent improvement over earlier methods. Given that the system had to choose from 22,000 possible categories, including distinctions between visually similar species, such as different types of skate fish, its accuracy was impressive. When the system was asked to classify images into 1,000 broader categories, its accuracy rate increased to over 50 percent.
Big Data
Training the multiple layers of virtual neurons in the experiment required 16,000 computer processors – the kind of large-scale computing infrastructure that Google has built for its search engine and other services. According to Dileep George, cofounder of the machine-learning startup Vicarious, at least 80 percent of recent progress in AI is due to increased computing power.
However, Google's advances in deep learning are not just about the size of its data centers. The company has also developed an efficient way to distribute computing tasks across multiple machines, significantly speeding up processing times. This system, which Jeff Dean helped develop during his 14 years at Google, allows deep-learning neural networks to be trained more quickly, enabling larger models and greater data input.
Deep learning has already led to improvements in voice search on smartphones. Before last year, Google's Android speech recognition system often misinterpreted words. In preparation for an Android update in July, Dean and his team replaced part of the system with a deep-learning-based approach. By using multiple layers of neurons, the system became better at recognizing variations in speech sounds, even in noisy environments like subway stations. As a result, it more accurately identifies spoken words and returns more relevant results. The number of errors dropped by up to 25 percent almost immediately, leading many reviewers to consider Android's voice search more effective than Apple's Siri voice assistant.
Despite the progress in deep learning, not everyone believes it will lead artificial intelligence to reach human-like intelligence. Some critics argue that deep learning and AI, in general, focus too much on raw computational power while overlooking key aspects of brain biology.
One critic is Jeff Hawkins, founder of Palm Computing, whose current company, Numenta, is developing a machine-learning system based on biological principles but without deep learning. Numenta's system is designed to predict energy consumption patterns and detect when machinery, such as wind turbines, is likely to fail. Hawkins, who wrote On Intelligence in 2004, a book about how the brain functions and how it could inform machine intelligence, believes deep learning does not properly incorporate the concept of time. He points out that human brains process sensory data as a continuous flow and learn by recognizing sequences of patterns. When watching a video of a cat behaving in an amusing way, for example, the movement is what makes sense, not a collection of still frames like those used in Google’s experiment. Hawkins argues that Google assumes large amounts of data can compensate for these limitations.
Even if vast amounts of data alone cannot solve all AI challenges, the computational resources available to companies like Google play a crucial role in advancing deep learning. Supporters of the technology emphasize that the brain remains far more complex than current neural networks. Geoffrey Hinton notes that significant computing power is essential to making deep-learning models function effectively.
What’s Next
Google remains discreet about its future plans, but the potential applications of deep learning are compelling. Improved image search could benefit platforms like YouTube, and according to Jeff Dean, phoneme data from English could be used to accelerate speech recognition training for other languages. Enhanced image recognition could also significantly improve self-driving car technology. Additionally, better and faster interpretation of user intent could transform search and the advertising that funds it, potentially predicting what users want before they even search for it.
Ray Kurzweil, who has long been fascinated by intelligent machines, finds this particularly exciting. At just 17, he developed software that allowed a computer to compose original music in classical styles, which he demonstrated on the TV show I’ve Got a Secret in 1965. Over the years, he has introduced several groundbreaking innovations, including a reading machine that converts text to speech, software capable of digitizing printed text in any font, music synthesizers replicating orchestral sounds, and a speech recognition system with an extensive vocabulary.
Looking ahead, he imagines a "cybernetic assistant" that could listen to conversations, scan emails, and track user activities, with permission, to provide useful information before it is even requested. While this is not his immediate focus at Google, it aligns with the vision of company cofounder Sergey Brin, who once expressed interest in developing a sentient computer similar to HAL from 2001: A Space Odyssey, but without the dangerous aspects.
For now, Kurzweil’s primary objective is to enhance computers’ ability to understand and communicate in natural language. His goal is to improve search capabilities and enable more effective question-answering systems. He aims to develop a more advanced and adaptable version of IBM’s Watson, which gained recognition for its ability to interpret and respond to complex Jeopardy! queries, such as identifying “a long, tiresome speech delivered by a frothy pie topping” as “a meringue harangue.”
Kurzweil’s work is not limited to deep learning, though he acknowledges that his approach to speech recognition is influenced by similar theories on how the brain processes information. His goal is to create a system that understands the actual meaning of words, phrases, and sentences, including the ambiguities that typically confuse computers. He envisions a graphical way to represent the semantic meaning of language.
Achieving this will require a more advanced method of mapping sentence structure. Google is already applying similar techniques to improve grammar in translation services. Beyond syntax, true natural language understanding will also demand that computers grasp common-sense reasoning. To support this effort, Kurzweil plans to utilize the Knowledge Graph, Google’s extensive database containing approximately 700 million topics, locations, and individuals, along with billions of relationships between them. Introduced the previous year, the Knowledge Graph was designed to provide users with direct answers rather than just a list of search results.
Kurzweil also intends to integrate deep-learning algorithms to help computers navigate the complexities and ambiguities of language. This is an ambitious challenge. He notes that understanding natural language is an ongoing effort, much like search technology, and will never be entirely complete.
While Kurzweil’s vision is still years away from being realized, deep learning is already opening new possibilities beyond speech and image recognition. One promising area is drug discovery. The unexpected success of Hinton’s team in the Merck competition demonstrated that deep learning can be valuable in fields where it was not previously considered useful.
Other potential applications are also emerging. Peter Lee of Microsoft highlights early research on using deep learning for machine vision, which could enhance technologies like industrial inspection and robotic navigation. He also sees possibilities in personal sensors that could help predict medical issues and city-wide sensor networks that could forecast traffic congestion.
No single approach will solve every challenge in a field as complex as artificial intelligence. However, deep learning is currently at the forefront of AI research. Jeff Dean describes it as a powerful model for understanding the world.
Recent posts

