![]() |
COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > RSE Seminars > A Modular OCR Solution for Logographic Scripts: From Labeling to Recognition and User Interface Design
![]() A Modular OCR Solution for Logographic Scripts: From Labeling to Recognition and User Interface DesignAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Jack Atkinson. In recent years, optical character recognition (OCR) has become increasingly efficient in recognizing real-world image-related data, particularly in contexts involving phonetic writing systems such as Latin-based or modern alphabetic scripts, where there are a manageable number of categories or sufficiently labeled training data. However, for early logographic systems whose characters derive from pictographic origins, such as Chinese oracle bone inscriptions, Egyptian hieroglyphs, Mesopotamian cuneiforms, and Mayan glyphs, there exist usually thousands of characters and even more graphic variants. As such, the relevant OCR systems often suffer from data inefficiency and class imbalance, presenting challenges for models like ResNet and other CNN -based networks. To make matters worse, historians and palaeographers constantly disagree on issues regarding character decipherment and classification, further complicating the processes of data labeling and dataset compilation. This talk will use Chinese oracle bone script as a case study to demonstrate how to efficiently address these challenges primarily through four stages of work: 1). font creation for ancient characters via image vectorization; 2). text encoding and labeling using external relational tables; 3). ResNet-based model training using synthetic data augmentation; 4). Product deployment using modern web architectures such as React and Vue.js. You will be able to find part of these work on: https://oracular.azurewebsites.net/. This talk is part of the RSE Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsModern Social and Economic History & Policy Seminar Number Theory Study Group: P-adic Analysis 7th Annual Building Bridges in Medical SciencesOther talksMutational signatures: From bytes to bedside Chalk talk JCTS Presentations Save the date. Details of this seminar will follow shortly. General solution to gauged U(1) anomaly equations Metabolic control of myeloid cell function |