AI based Methods for Using Text as Data in the Social Sciences

Register now for this in-person workshop in Leipzig to get to know state of the art AI applications for using text as data in social science research!

Moderated by Prof. Dr. Gerhard Heyer (University of Leipzig), you will be introduced to various examples. This workshop is for both, researchers who are just starting with text-as-data and those looking for advanced practical applications.

:spiral_calendar_pad: Dec 12, 2023
:timer_clock: 1 pm – 5.15 pm
:round_pushpin: Seminargebäude Uni Leipzig


1) iLCM – an interactive text mining environment for social and economic scientists

by Dr. Christian Kahmann (University of Leipzig)


iLCM is an integrated research environment for the analysis of structured and unstructured data in a ‘Software as a Service’ architecture (SaaS), and has been designed to address the needs of researchers with little experience in working with text mining tools as well as experienced researchers with substantial knowledge of the R language. It supports the quantitative evaluation of large amounts of qualitative data using text mining methods, including organising data into subcorpora, annotating and classifying data with active learning, and representing data and topics over time. We introduce the software and present a real application use case.

2) Working with English News Corpora in the Leipzig Corpora Collection

by Dr. Thomas Eckart, Erik Körner and Felix Helfer (Sächsische Akademie der Wissenschaften)


The Leipzig Corpora Collection (LCC) contains up-to-date and time-stamped crawled news data for more than 900 corpora in more than 250 languages. We shall present the available data for corpora in English using the (No)Sketch Searchengine, including their metadata such as publication date, or subject area, and demonstrate by way of example how the corpora can be used for an application in the social sciences using a complex search environment enhanced by linguistic pre-processing.

3) Active Learning with (L)LMs: State of the Art and Practical Challenges

by Christopher Schröder (Center for Scalable Data Analytics and Artificial Intelligence, ScaDS.AI, Dresden/Leipzig


Following a brief introduction to Active Learning, we shall demonstrate how the Active Learning Library Small-Text can be applied to a “Words-of-the-Day”-Corpus of the Leipzig Corpora Collection. In addition we shall discuss how LLMs can be used for social science research, and how they can be optimised with respect to performance and memory requirements.

About the Host

Gerhard Heyer is a professor of Natural Language Processing at the Institute of Computer Science at the University of Leipzig. His research primarily focuses on research data infrastructures, automatic semantic processing of the text, and applications of text mining, including in the Digital Humanities.