AI Advances in the GENTIO Project
The FFG funded GENTIO project has ended with MODUL Technology contributing advances in AI-based NLP and NER/NEL approaches, development of its Semantic Knowledge Base, and new classification and summarization algorithms. State of the art AI has improved our Natural Language Processing, Named Entity Recognition and Named Entity Linking components. Good NER combined with relation extraction can feed our Semantic Knowledge Base with new entities and relations automatically identified in news articles. To explore new possibilities to tackle this, we started looking at fine tuning Large Language Models (LLMs) with promising early results, the subject of a conference presentation. Transformers models were compared for the task of text summarization.
A significant impact was made in the task of news article classification, following the IPTC News categories. Here, we decided on fine tuning on top of BERTopic, a neural model for topic classification. The initial topic classification is fed through a Transformer model to align topics to the IPTC News categories. We could show higher accuracy scores on annotated news corpora than just using a language model like RoBERTa to classify articles directly, also published this year at a conference. GENTIO has enabled MODUL Technology to advance in its deployment of state of the art AI (e.g. Transformers models, language models) for core text understanding tasks than underlie ‘Web intelligence’, i.e. extracting trends and insights from Web data.