Dimosthenis Karatzas


Refereed Papers

A Framework for the Assessment of Text Extraction Algorithms on Complex Colour Images

A. Clavelli, D. Karatzas and J. Llados

Proceedings of the 9th IARP Workshop on Document Analysis Systems, ACM Press, pp. 19-28, Boston, MA, USA, 2010


The availability of open, ground-truthed datasets and clear performance metrics is a crucial factor in the development of an application domain. The domain of colour text image analysis (real scenes, Web and spam images, scanned colour documents) has traditionally suffered from a lack of a comprehensive performance evaluation framework. Such a framework is extremely difficult to specify, and corresponding pixel-level accurate information tedious to define. In this paper we discuss the challenges and technical issues associated with developing such a framework. Then, we describe a complete framework for the evaluation of text extraction methods at multiple levels, provide a detailed ground-truth specification and present a case study on how this framework can be used in a real-life situation.

