Alphabet Inc Is Set To Monetize AlphaGo Technology

  • Alphabet's DeepMind Artificial Intelligence (AI) company announced AI-based speech generation technology WaveNet.
  • WaveNet uses neural network technology related to those used in AlphaGo, DeepMind's breakthroughs Go-playing program.
  • WaveNet efficiently generates realistic human voices, reducing the gap to human-level performance by over 50 percent.

A few months ago Amigobulls covered the spectacular story of Alphabet's (NASDAQ:GOOGL) AlphaGo, the first Artificial Intelligence (AI) program to beat the world's top-ranked human Go player. Before AlphaGo, beating a top ranked Go player was thought to be a remote, ten years in the future goal for AI. The AlphaGo breakthrough has been rightfully hailed as an important milestone for AI.

But one doesn't make much money playing Go. Top Go players do make very good money with prizes and sponsors in places like Japan and Korea where the game is very popular, but of course, Alphabet is after much bigger money than that. In other words, the company needs to convert its research results into commercial products.

Also Read: Artificial Intelligence And Machine Learning To Power The Alphabet Cloud

AlphaGo was developed by Google DeepMind, a British AI company founded in 2010 as DeepMind Technologies and acquired by Google in 2014. For AlphaGo, DeepMind created deep neural networks that learn how to play games in a similar fashion to humans and appear to mimic key cognitive aspects of the human brain. But advanced deep neural networks have many applications besides games, and some applications have clear commercial value.

A few days ago DeepMind announced WaveNet, a deep generative model of raw audio waveforms was able to generate speech that mimics any human voice and sounds more natural than the best existing Text-to-Speech (TTS) systems, reducing the gap with the human performance by over 50 percent.

"Allowing people to converse with machines is a long-standing dream of human-computer interaction," reads the DeepMind announcement, which notes that the ability of computers to understand natural speech has been revolutionized in the last few years by the application of deep neural networks. However, generating speech with computers is still based on old techniques where short speech fragments are recorded from a single speaker and recombined. "This makes it difficult to modify the voice (for example switching to a different speaker, or altering the emphasis or emotion of their speech) without recording a whole new database," emphasizes the announcement.

Voice recognition and generation technology powers all sorts of computer systems that interact with users by voice, from customer services switchboard systems to personal assistants on smartphones, such as Apple's (NASDAQ:AAPL) Siri, Microsoft's (NASDAQ:MSFT) Cortana, and Alphabet's own Google Now. The holy grail of voice synthesis is generating computer voices that sound and feel exactly like human voices. Samantha, the science-fictional AI assistant in the film "Her," played by the disembodied voice of Scarlett Johansson, able to sound totally human and communicate deep emotional content, is still a far goal, but the DeepMind announcement represents an important step in that direction.

Make no mistake, there's a lot of money in computer assistants that sound like people. Consumers love them, not only for their practical utility but also because many people miss emotionally satisfying interactions with other people. That people look to computers as friends is perhaps a sad symptom of existential malaise in today's society, but it's also a fact that consumer-facing businesses can't ignore.

For example, Microsoft has been testing its AI-powered chatbot technology in China with XiaoIce, a program that people can add as a friend on Chinese social networks. Now XiaoIce is a huge hit in China, millions of Chinese people chat with her every day, and some consider her as a loved friend. XiaoIce is significantly more sophisticated than current generation personal assistants and is able to conduct human-like conversations with simulated emotional content.

Xiaoice is a text chatbot, though - it doesn't have voice. Which is exactly what DeepMind's WaveNet could deliver to "computer friends," but also to personal assistants and business systems. CNBC notes that TTS synthesis is a technology that companies from Apple to Microsoft are interested in as they could be critical in making digital personal assistants such as Siri or Cortana smarter and more human-like.

The DeepMind announcement provides technical details of how WaveNet works. Basically, the system is a neural network that learns the characteristics of many different voices, male and female, based on which it models the raw waveform of the desired output audio signal, one sample at a time. DeepMind notes that training WaveNet on many speakers made it better at modeling a single speaker than training on that speaker alone, suggesting a form of transfer learning. More technical details are given in the research paper "WaveNet: A Generative Model For Raw Audio."

"For both Chinese and English, Google's current TTS systems are considered among the best worldwide, so improving on both with a single model is a major achievement," emphasizes the DeepMind announcement. "WaveNets reduce the gap between the state of the art and human-level performance by over 50% for both US English and Mandarin Chinese."


The path to commercial exploitation of WaveNet technology is clear: First, Alphabet can use its TTS technology to give an edge to its own voice interfaces over the competition. Second, the technology can be licensed to phone companies, car makers, call centers, computer game makers, and all enterprises that need voice-based interfaces. Though this is but a drop in the ocean of Alphabet's activities, it's an important one, which is good news for Alphabet investors.

Giulio Prisco Giulio Prisco   on Amigobulls :
Author's Disclosures & Disclaimers:
  • I do not hold any positions in the stocks mentioned in this post and don't intend to initiate a position in the next 72 hours
  • I am not an investment advisor, and my opinion should not be treated as investment advice.
  • I am not being compensated for this post (except possibly by Amigobulls).
  • I do not have any business relationship with the companies mentioned in this post.
Amigobulls Disclosures & Disclaimers:

This post has been submitted by an independent external contributor. This author may or may not hold any positions in the stocks discussed. Neither Amigobulls, nor any members of its staff hold positions in any of the stocks discussed in this post. Amigobulls has not verified the author’s positions in the stocks discussed, and does not provide any guarantees in this regard. The author may be paid by Amigobulls for this contribution, under the paid contributors program. However, Amigobulls does not guarantee the authenticity or accuracy of the information provided by the author in this post.

The author may not be a qualified investment advisor. The opinions stated in the post should not be treated as investment advice. Buying and selling of securities carries the risk of monetary losses. Readers/Viewers are advised to carry out their own due diligence and consult their investment advisors before making any investment decisions.

Amigobulls does not have any business relationship with any of the companies covered in this post. This post represents the views of the author/contributor and may not reflect the views of Amigobulls.

show more

Comments on this article and GOOGL stock

user profile picture
There's an even greater achievement by Wavenet that the article has ignored, which is state of the art music generation.

Those piano passages aren't regurgitated from training data, neither are they cheesy "MIDI to soundbank" conversions; they are literally what the model dreamed up from scratch, when told to sound like a piano. The mere fact it even sounds like a piano at all is already a stunning achievement, let alone the fact it even seems to have grasped some basic notions of what makes music musical.

In the future, this technology has the potential to mimic human players at an unprecedented level and perhaps even be the first AI to compose actually good-sounding music without human intervention.
Do share this awesome post