Go Back

Jörg Tiedemann

16.1.1972, Wernigerode, Germany

PhD in Computational Linguistics 2003, Uppsala University

Professor of Language Technology 2015-, University of Helsinki
Visiting Professor 2009–2014, Uppsala University
Postdoctoral researcher 2004–2009, University of Groningen

Publications, projects and other scientific activities
Research interests:
Machine translation, multilingual text processing, question answering and information extraction

Photo: Linda Tammisto
Written by Jörg Tiedemann (Tomas Sjöblom, ed.)

OPUS – Parallel and free

In 2003, after an intensive summer school in the beautiful surroundings of Rosendal in Norway, my fellow student Lars Nygaard and I were sitting in a café to chat about our future. Here the idea of an open collection of human translations for the benefit of everyone was born and we went home with the name in mind we made up for our resource: OPUS – the Open Parallel corpUS.

Following this came a start on a small scale with a few translated resources from open-source projects but it was positively received from the very beginning. Over ten years later, the collection has grown enormously covering over 250 languages and language variants with billions of words from different domains. The OPUS website now attracts over 10,000 unique visitors every month and the number is growing.

The logo of OPUS, the Open Parallel corpUS.

In the beginning, OPUS was quite unique in offering data sets without any further restrictions. At that time, resources were still locked behind closed doors and in the drawers of researchers who were afraid of sharing for any kind of reason. Since then, the benefits of open data have become obvious and I am proud that OPUS can be part of this development. It is with great pleasure that I see that researchers from various disciplines have started to look at the available resources.

The main intention with the collection is to support language technology and data-driven machine translation in particular. But its potential in general linguistics, translation studies and other fields in the humanities (or computer science) is becoming more and more visible. So, why not study language variation based on translated movie subtitles? Or what about looking at the use of rhetorical figures across languages based on speeches in the European parliament?

Have a look and decide for yourself what OPUS has to offer. There might be something in there for you as well.

An example of a query made in OPUS, with the word “human”.

 

Go Back