期刊论文详细信息
University of Sindh Journal of Information and Communication Technology 卷:3
Issues & Challenges in Urdu OCR
关键词: Urdu Recognition Challenges;    Urdu Text;    Urdu OCR;    Optical Character Recognition;    Character Dots;   
DOI  :  
来源: DOAJ
【 摘 要 】

Optical character recognition is a technique that is used to recognized printed and handwritten text into editable text format. There has been a lot of work done through this technology in identifying characters of different languages with variety of scripts. In which Latin scripts with isolated characters (non-cursive) like English are easy to recognize and significant advances have been made in the recognition; whereas, Arabic and its related cursive languages like Urdu have more complicated and intermingled scripts, are not much worked. This paper discusses a detail of various scripts of Urdu language also discuss issues and challenges regarding Urdu OCR. due to its cursive nature which include cursiveness, more characters dots, large set of characters for recognition, more base shape group characters, placement of dots, ambiguity between the characters and ligatures with very slight difference, context sensitive shapes, ligatures, noise, skew and fonts in Urdu OCR. This paper provides a better understanding toward all the possible engendering dilemmas related to Urdu character recognition.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次