Text- and Data-Mining

Information Mining and Semantic Technologies

Rapidly growing volumes of scientific literature and technical information, e.g. patents, and increased complexity in searching and finding relevant knowledge, pose major challenges to scientists and information professionals. Therefore, it is necessary to research, develop and apply new methods of text and data mining (TDM) and semantic analysis in order to discover, extract and fuse relevant information for a variety of search and analysis tasks or for subsequent decision-making processes. Focusing on patent search and analysis we are seeking for dedicated solutions applying machine learning, natural language processing and semantic processing for querying and analyzing large amounts of patent data (Big Data Analytics) and the semantic indexing of patent full texts, e. g., with the help of knowledge from ontologies and linked open data (LOD).

Patent Mining and Semantic Enrichment

Previous work has shown that for the efficient analysis of patent documents, their unstructured full text sections must be further structured and semantically enriched in order to obtain high-quality results and to create added value for subsequent tasks such as patent prior art search or drug discovery, technology scouting, etc. Therefore, one of our main focuses is on a deeper structuring and semantic indexing of the claims and the detailed description of a patent text by means of machine learning approaches. Hereby, it is possible to create more accurate search queries, e.g., in the context of a chemical patent search, or to extract domain-specific entities and terminologies automatically to an increasing extent.

Furthermore, the automatic semantic indexing of text documents also requires a formal specification and machine-understandable representation of essential parts of the document content, e. g., in RDF/OWL based on Semantic Web standards and the linked data principles. Herewith, besides aiding in search and analysis of patent information sources, it also enables interoperability with existing external knowledge bases, e. g,. from the LOD cloud, allowing for the integration and exploitation of specialized knowledge bases (e. g. bio-chemical information from ChEMBL) and semantic services in complex analysis workflows.

Semantic Search and Knowledge Graphs

In addition to established methods of information retrieval in the context of patent searches using Boolean logic, enhanced and complex search capabilities need to be developed using an entity or knowledge-based approach. Our aim is to investigate and employ different forms of semantic search using knowledge graphs. In addition to the use of domain-specific ontologies, relevant entities and their relations can be increasingly extracted and linked automatically by means of entity recognition and disambiguation applying, e.g., Deep Learning or other machine learning techniques.

Big Data Analytics

Rapidly growing volumes of scientific and technical information require innovative and scalable methods for querying, analyzing and visualizing relevant information efficiently.  For this reason, we research and develop specific solutions for the analysis of large domain-specific patent collections in order to be able to answer complex questions, e. g., for the life-science domain. Together with new methods of the TDM, our goal is to identify complex knowledge from large amounts of data (e. g., technology trends).


Dr. Hidir Aras
Head TDM

Phone: +49 7247 808 306

More Information