Corpus Italiano

Partnership of Paisà

The project is a joint effort of:

University of Bologna - Sergio Scalise with colleagues Claudia Borghetti and Francesca Masini (in charge of the research unit during 2012/2013)
CNR Pisa - Vito Pirrelli with colleagues Alessandro Lenci, and Felice Dell'Orletta
European Academy of Bozen/Bolzano - Andrea Abel with colleagues Chris Culy, Henrik Dittmann, and Verena Lyding
University of Trento - Marco Baroni with colleagues Marco Brunello, Sara Castagnoli, and Egon Stemle

Project Lead

Responsibilities are divided among the partners as follows:

[corpus creation]: The corpus collection is done by the University of Trento. Copyright-free text materials are bootstrapped from the web. The harvested texts are automatically cleaned by stripping of html tags and other formatting and navigation data (for more information see construction steps).
[corpus annotation]: The linguistic annotation of the corpus is done by combining manual and automatic annotation procedures. Manually annotated data is used to refine the computational linguistic methods and tools used for corpus annotation (for more information see construction steps). The manual annotation of corpus texts and the evaluation of analysis tools is done by researchers of the University of Bologna, the University of Trento, and CNR Pisa. Tools are developed, adjusted and applied by the CNR Pisa.
[corpus interface]: The corpus is made available to the public via a free online interface. The creation of a multi-facetted user interface for language learners and researchers is accomplished by the European Academy of Bozen/Bolzano.