English Italiano
Filters
The Filter interface allows full text corpus documents to be retrieved according to a set of criteria specified by the user. The documents can be inspected one by one, or downloaded as .zipped .txt files.
The filter interface also allows for the creation of named subcorpora composed of filtered texts. After their creation, subcorpora show up in the corpus drop-down menu and can be queried like the full PAISÀ corpus.
Filter criteria
- a keyword has to be entered into the "Keyword" search box; only documents containing the keyword will be retrieved
- the number of tokens within a text - i.e. the number of running words of a text document
- the number of tokens of non-basic vocabulary within a text - words that are not contained in the basic vocabulary according to these lists
- the number of sentences within a text
- the type-token ratio of a text, for details see here
- the Gulpease index of a text, for details see here
- the top-level domain - i.e., the final domain tag, e.g., ".it", ".org" or ".com" of the URL the text is taken from
- the core URL - i.e. the name of a web page from which more than 500 texts were taken from to build the PAISÀ corpus
Filtered results are provided in three different manners:
- lists of text documents
- named subcorpus
- word cloud
Lists of text documents
The list of texts satisfying the filtering criteria can be paged through by clicking on arrow icons (see screenshot below); single texts can be opened in a separate tab by clicking on the file name or icon.
Named subcorpora
A named subcorpus containing all corpus texts that satisfy the filtering criteria can be stored by entering a name for the subcorpus in the appropriate field (see screenshot below) before clicking "submit". The name for the subcorpus has to start with a capital letter and can be composed of letters, numbers and underscore.
User-defined subcorpora show up in the corpus dropdown menu and can be used for subsequent querying. The subcorpus called "Last" always stores the results of the most recent query or filtering carried out by the user.
Word Cloud
A word cloud is built based on the word frequencies of 80 of the documents that satisfy the filtering criteria. Words are displayed in alphabetic order and are scaled according to their frequencies.
The screenshot below shows a Word Cloud for documents filtered by the keyword "ferie".
The word cloud is implemented based on Google Visualization API.
You need more help? See here for an overview of our help pages.