Processing BERD

BERD@NFDI supports a community that works with many different resource types. Since they often do not stem from a single entity, they adhere to potentially different data management practices and can be created with different data collection approaches. Thus, researchers have to deal with a variety of data and different levels of data quality. If the data is already available in a structured form, established checks and normalization procedures can be used to improve the quality of the (meta-)data. For unstructured data, new methods of classification, normalization, and quality assessment have been applied, but there is no commonly accepted standard. When dealing with historical data sources, the printed sources must first be digitized using text recognition methods. In all cases, data protection requirements may make further processing necessary (e.g., anonymization to protect personal information). BERD@NFDI will support the research community in selecting suitable methods for processing BERD and in documenting and making them accessible. We will develop standards and guidelines for the processing and documentation of unstructured data, evaluate new anonymization methods and provide tools to manage the conversion of historical data sources.

Measures:

  • Quality assurance and normalization
  • Anonymization
  • Processing Digitized Documents