Licence for better text comprehension

Max Planck Innovation awards machine text comprehension technology licence

Based on research by the Max Planck Institute for Informatics, Ambiverse has developed a software application that can understand and analyze large volumes of text. The technology means that homonyms can be correctly interpreted, opening up corporate access to more relevant information on the Internet and improving access to information in their own data inventories.

In our digitalized world, data increasingly represent the basis for corporate success. Many companies amass great volumes of data and can access information globally on the Internet at any time. However, the benefits of these data are strictly limited, because the available information cannot be correctly, or only insufficiently, categorized by machines.

Homonyms or terms with variable meaning, names, places, companies or events that cannot be unequivocally deciphered are generally responsible for this: people can understand the meaning of the statement 'after injury, no Neuer in sight' based on the context. An informed reader realizes that the subject is the German national goalkeeper Manuel Neuer (which means “new one” in German), whose injury causes problems for his club and the national football team. This interpretation represents a much greater challenge for machines. However, this knowledge is essential for the deeper understanding and exploitation of digital content.

Ambiverse, a company founded by Max Planck Institute for Informatics scientists, among others, in Saarbrücken has developed an intelligent software solution to automatically and accurately recognize and resolve homonyms and phrases with different meanings in texts. It is built on the YAGO knowledge base developed at the Institute. This semantic database, containing more than ten million entities (names, organisations, towns, etc.) and based in part on Wikipedia, represents something resembling a dictionary for machines. The software contains program code, which compares words in the text to specific entities in YAGO and thereby allows computers to accurately and unmistakably interpret words.

This is made possible because, in addition to entities (for example: Manuel Neuer), the knowledge base also stores and combines categories (here: person, footballer) and facts (injury). This combination leads to a categorization accuracy greater than 80 percent and, according to IBM, represents a reference method. Ambiverse provides a program interface for integrating company-specific databases to facilitate optimization of internal corporate data.

“Our cloud-based technology allows individual, highly specific search and analysis tools, tailored to the individual company, for searching news archives, corporate documents and product reviews, to be created – not only in German, but also in English, Spanish and Chinese. Insights that can be exploited by companies can thus be gained even from large, unstructured texts. By continually expanding our knowledge base and our technology, we aim to ensure that texts can be understood by machines without restriction, enabling the highest complexity queries to be answered,” says Johannes Hoffart, director and co-founder of Ambiverse. “We are pleased that Ambiverse is further developing an innovative technology from the Max Planck Institute for Informatics for commercial use in a pioneering environment, hence making it available to businesses. Ambiverse employees have gained their work experience in companies such as Google, SAP and Microsoft, and bring with them the expertise necessary to deliver market success for this technology,” says Bernd Ctortecka, Patent and Licence Manager at Max Planck Innovation, the technology transfer organization of the Max Planck Society.

