Growing volumes of data and complex information requirements of users, for example within the scope of search processes, pose major challenges to scientists and information professionals. Therefore, new ideas and methods, for example, of Text and Data Mining (TDM), are needed that can be applied to analyze and retrieve knowledge within the scope of specific issues in connection with searching and analyzing patents. We therefore research into specific methods to retrieve and analyze large volumes of data (Big Data Analytics) and the semantic indexing of patent full texts, e.g., with the help of knowledge from ontologies and Linked Open Data (LOD).
Patent Mining and Semantic Enrichment
For the efficient analysis of patent documents, their unstructured full text sections must be further structured and semantically enriched in order to obtain high-quality results. This is one of our focuses of activities. Another is on semantic indexing of the claims and the detailed description of a patent text by means of machine learning approaches. Such approaches not only serve to formulate more precise search queries but also to improve and develop extended methods to extract domain-specific entities and terminologies automatically.
Furthermore, the automatic semantic indexing of text documents also requires a semantic representation of essential parts of the document content, e.g., according to RDF/OWL standards. This regards both the representation of metadata and the display of relevant, automatically extracted entities and relations of a domain. The semantic indexing of information also enables interoperability with existing external sources, e.g., from the LOD cloud, or the integration and exploitation of specialized knowledge bases (e.g. bio-chemical information from ChEMBL) and semantic services in complex analysis workflows.
Semantic Search and Knowledge Graphs
In addition to established methods of information retrieval in the context of patent searches using Boolean logic, enhanced and complex search capabilities need to be developed using an entity or knowledge-based approach. Our aim is to investigate and employ different forms of semantic search using knowledge graphs. In addition to the use of domain-specific ontologies, relevant entities and their relations can be increasingly extracted and linked automatically by means of entity recognition and disambiguation applying, e.g., Deep Learning (DL) or other machine learning techniques.
Big Data Analytics
Rapidly and dynamically growing volumes of scientific and technical information require innovative and scalable methods for querying, analyzing and visualizing relevant information efficiently. For this reason, we research and develop specific solutions for the analysis of large domain-specific patent collections in order to be able to answer complex questions, e.g., for the life-science domain. Together with new methods of the TDM, our goal is to identify complex knowledge from large amounts of data (e.g., technology trends).
Dr. Hidir Aras
Project Leader TDM
IT Development and Applied Research
Sofean, M.; Aras, H.
Technological Areas Detection and Clustering for Large-scale of Patent Texts. In Proc. of the International Association for Development of the Information Society (IADIS) of the 3rd International Conference on Big Data Analytics, Data Mining and Computational Intelligence (BigDaCI 2018).