A Complete Approach to the Conversion of Typewritten Historical Documents for Digital Archives

A. Antonacopoulos, D. Karatzas

in the book: Document Analysis Systems VI: proceedings of the 6th International Association of Pattern Recognition (IAPR) Workshop on Document Analysis Systems (DAS2004), S. Marinai and A. Dengel (Eds.), Springer Lecture Notes in Computer Science, LNCS 3163, 2004, pp. 90-101


This paper presents a complete system that historians/archivists can use to digitize whole collections of documents relating to personal information. The system integrates tools and processes that facilitate scanning, image indexing, document (physical and logical) structure definition, document image analysis, recognition, proofreading/correction and semantic tagging. The system is described in the context of different types of typewritten documents relating to prisoners in World-War II concentration camps and is the result of a multinational collaboration under the MEMORIAL project funded (1.5M) by the European Union (www.memorial-project.info). Results on a representative selection of documents show a significant improvement not only in terms of OCR accuracy but also in terms of overall time/cost involved in converting these documents for digital archives. This work is supported by the European Union grant IST-2001-33441.

