Text Extraction from Web Images Based on Human Perception and Fuzzy Inference

A. Antonacopoulos, D. Karatzas

Proceedings of the First International Workshop on Web Document Analysis (WDA2001), Seattle, USA, September 2001, PRImA Press, pp. 35-38


There is a significant need to extract and recognise the semantically-important text contained in images on Web pages. This paper proposes a new approach to text extraction from this special class of images. The method attempts to emulate closer than before the way humans perceive colour differences in order to differentiate between text and background regions. Pixels of similar colour (as humans see it) are merged into components and a fuzzy inference mechanism (using connectivity and colour distance features) is devised to group components into larger character-like regions.

