A Workflow for Extracting Sanskrit Text from Transcripts and Translating it to English

Main Article Content

Meher Pranav Kurapati, Harshith Reddy Soda,Hemanth Chowdary Vallabhaneni,Murali Mohan Vutukuru,Lohith Reddy Burramukku,Satish Thatavarthi

Abstract

Introduction: The digitalization of historical and precious writings has gained prominence in recent years, primarily for the protection and accessibility of cultural property. However, extracting Sanskrit text from scanned photos poses significant challenges due to the intricate and diverse nature of Sanskrit script.


Objectives: This research aims to develop a novel methodology for extracting Sanskrit text from scanned photos using a hybrid approach that integrates Convolutional Neural Networks (CNN) and Optical Character Recognition (OCR) techniques. The study seeks to address the complexities of Sanskrit script by combining deep learning with classical text recognition methods.


Methods: The process involves pre-processing scanned images to enhance text visibility, followed by training a CNN model to detect and localize text regions accurately. Subsequently, an OCR algorithm, fine-tuned specifically for Sanskrit characters, is applied to the localized areas to ensure accurate reproduction of the text. This hybrid CNN-OCR model surpasses traditional OCR methods by effectively handling variations in font styles, sizes, and script complexities unique to Sanskrit.


Results: Experimental results demonstrate the efficacy of the proposed approach across a broad spectrum of Sanskrit materials, including ancient manuscripts and printed texts. The CNN-OCR hybrid model automates the text extraction process and significantly improves accuracy in identifying Sanskrit characters compared to conventional methods. This advancement contributes to the digital preservation and enhanced accessibility of Sanskrit literature and heritage.


Conclusions: The combination of CNN-based text identification and Sanskrit-tailored OCR offers a promising avenue for advancements in historical text digitization. By bridging modern technology with classical language studies, this study facilitates the preservation and dissemination of Sanskrit heritage, thus enriching cultural knowledge and understanding.

Article Details

Section
Articles