A Modular OCR Solution for Logographic Scripts: From Labeling to Recognition and User Interface Design
- 👤 Speaker: Peichao Qin - Faculty of Asian and Middle Eastern Studies
- 📅 Date & Time: Thursday 15 May 2025, 13:00 - 14:00
- 📍 Venue: JJ Thomson Seminar Room, Maxwell Centre, and on Zoom
Abstract
In recent years, optical character recognition (OCR) has become increasingly efficient in recognizing real-world image-related data, particularly in contexts involving phonetic writing systems such as Latin-based or modern alphabetic scripts, where there are a manageable number of categories or sufficiently labeled training data. However, for early logographic systems whose characters derive from pictographic origins, such as Chinese oracle bone inscriptions, Egyptian hieroglyphs, Mesopotamian cuneiforms, and Mayan glyphs, there exist usually thousands of characters and even more graphic variants. As such, the relevant OCR systems often suffer from data inefficiency and class imbalance, presenting challenges for models like ResNet and other CNN -based networks. To make matters worse, historians and palaeographers constantly disagree on issues regarding character decipherment and classification, further complicating the processes of data labeling and dataset compilation. This talk will use Chinese oracle bone script as a case study to demonstrate how to efficiently address these challenges primarily through four stages of work: 1). font creation for ancient characters via image vectorization; 2). text encoding and labeling using external relational tables; 3). ResNet-based model training using synthetic data augmentation; 4). Product deployment using modern web architectures such as React and Vue.js. You will be able to find part of these work on: https://oracular.azurewebsites.net/.
Series This talk is part of the RSE Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Interested Talks
- JJ Thomson Seminar Room, Maxwell Centre, and on Zoom
- ndk22's list
- ob366-ai4er
- rp587
- RSE Seminars
- se393's list
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Peichao Qin - Faculty of Asian and Middle Eastern Studies
Thursday 15 May 2025, 13:00-14:00