FIZ Karlsruhe offers internships (Praktika) with a duration of one month or more. Possible topics might include:
- Exploration of Elasticsearch’s Significant Terms Aggregation by means of a basic gold standard. Elasticsearch is a search and analytics engine based on Lucene. Its new version includes the experimental Significant Terms Aggregation, designed to extract keywords relating to a query.
- Evaluate several PoS taggers with patent texts and, when indicated, make suggestions for improvement. PoS-taggers are a commodity for natural language processing but they were primarily designed for common texts or scientific articles. Their application on patent texts might be wanting.
- Analogous investigations can be carried out with chunkers (shallow parsers).
- Structural segmentation of patent texts, i. e., the identification of paragraphs, headers, figures, tables, etc...
For more information please contact: