Skip to yearly menu bar Skip to main content


Poster

MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition

Maksim Golyadkin · Rubanova Alexandrovna · Aleksandr Utkov · Dmitry Nikolotov · Ilya Makarov


Abstract:

The recognition of ancient Egyptian hieroglyphs presents significant challenges due to the vast stylistic variations and the scarcity of labeled data. While deep learning has shown promising results, existing approaches often rely on single-source or synthetic datasets, limiting their generalization ability. To advance research in hieroglyph recognition, we introduce the Multisource Egyptian Hieroglyphs (MEH) dataset, the first multi-style dataset for hieroglyph classification. MEH comprises 10 distinct groups, each representing a unique writing style, with labels derived from professionally verified text digitizations. Using this dataset, we explore three key aspects of hieroglyph recognition: (1) analyzing how different writing styles affect model generalization, (2) evaluating synthetic data generation for expanding hieroglyph class coverage, and (3) assessing classification performance of existing models. To support future large-scale dataset creation, we propose a style-aware synthetic data generation method and introduce a hieroglyph labeling tool to simplify annotation and accelerate text digitization.

Live content is unavailable. Log in and register to view live content