Science and Technology Observatory (OST)

Measuring technological innovation: optimising the benefits of semantic patent analysis

Published on

The Science and Technology Observatory (OST) department is launching a project to develop new indicators to measure technological innovation through semantic patent analysis. Dominique Guellec, the OST's scientific advisor and project leader, explains the objectives and issues at stake.

Last September, you joined the OST to lead the semantic patent analysis project, which includes enabling the production of indicators that reflect the novelty and impact of inventions. What are the current limitations on the measurement of technological innovation, and what contribution can semantic analysis make?

Technological innovation is mainly measured by patent-based indicators, usually consisting of counts of the patents themselves, possibly weighted by their citations in other patents, or by other metadata.

This approach has its merits, but it can be improved. Indeed, although patents do reflect innovation, they are governed by their own legal and strategic logic, which is a source of statistical noise and bias. New natural language processing (NLP) techniques make it possible to analyse the texts of patents, which include a description of the invention, and from this to develop indicators that no longer reflect the patents as such, but relate directly to the inventions they describe, which will be more accurate from a technological standpoint. NLP is a branch of artificial intelligence; it is used in many fields such as question interpretation (on our mobile phones), text classification etc., and is a very hot research topic.
 

What are the goals of the project?

The project will develop and test indicators that reflect the dynamics of innovation by applying NLP to patents. Different indicators could be compiled, reflecting the novelty, impact or other characteristics of the inventions, respectively. The aggregation of these indicators at the level of an enterprise, research organisation or country would reflect the innovation dynamics of the entity concerned.
 

What data and methods will be used?

Patent data is public. Various databases are accessible and clean enough to enable statistical processing (such as the OST database). Text data should be added to this project, which is possible because it is made available by patent offices. This project is expected to focus on one or more of the major patent offices: EPO, WIPO or USPTO.

The first stage of the method will consist in vectorising the texts (patents), which will enable their comparison in a second stage. This vectorization is intended to capture the important elements of the text, i.e. the main aspects of the invention; it also serves to reduce the dimensionality of the documents and the associated noise. Different NLP methods could be used for this project, such as word embedding, or very recent techniques using artificial neural networks. These techniques are constantly improving, and current efforts are aimed at better capturing context and syntax within documents. The second step will consist in calculating novelty indicators (semantic distance between a patent and earlier patents) and impact indicators (distance between a patent and later patents): different types of distance will be calculated and then validated, including by humans (experts).
 

For which applications and outputs?

Novelty and impact are fundamental characteristics of any invention, and having reliable indicators would be useful for all analyses of innovation, for evaluation, analytical or policy purposes. These indicators would therefore be of interest to researchers, businesses, public institutions and public policy makers.

The methods derived from this project could also be applied to other data such as scientific publications or descriptions of inventions published on the Internet.

This project will enable the production of new quantitative methods, open-access databases, innovation indicators and analytical reports using these indicators.
 

What resources are required?

The implementation of this project, estimated to take 2 years, requires:

  1. adequate databases (the data exist but need to be assembled);
  2. massive computing power;
  3. advanced qualifications in NLP, data science, and programming.

The OST already has extensive expertise in data science, which will be at the heart of this project; this could be supplemented by a partnership with a research team specialising in NLP (e.g. at the CNRS).
 

Biographical notes

Dominique Guellec is a scientific advisor at the OST. He contributes to OST's activities, particularly in the fields of patent statistics and the evaluation of public research and innovation policies. He is in charge of a project that uses semantic patent analysis techniques to produce indicators reflecting the novelty and impact of inventions, among others. 

Until August 2019, Dominique Guellec was Head of the Science and Technology Policy Division at the Organisation for Economic Co-operation and Development (OECD). In this capacity, he led the 2014 study on France's research and innovation system, for example. He was previously Head of Science and Technology Statistics at the OECD, overseeing the revision of the Frascati Manual in 2001 and the revision of the Patent Statistics Manual in 2009. He was also Chief Economist at the European Patent Office, where he set up the PATSTAT database. He has published or co-published numerous academic articles and several books on innovation and growth, in both French and English (including: Économie de l'innovation, éditions la Découverte 2018; The Economics of the European Patent System, Oxford University press, 2007).

Dominique Guellec is a director of INSEE and a former student at ENSAE.