IEEE Access | |
Generation of Synthetic Data for Handwritten Word Alteration Detection | |
Soumen Bag1  Prabhat Dansena1  Rajarshi Pal2  | |
[1] Department of Computer Science and Engineering, Indian Institute of Technology (ISM) Dhanbad, Dhanbad, India;Institute for Development and Research in Banking Technology, Hyderabad, India; | |
关键词: Convolution neural network; document forensics; handwritten; ink analysis; synthetic data; | |
DOI : 10.1109/ACCESS.2021.3059342 | |
来源: DOAJ |
【 摘 要 】
Fraudsters often alter handwritten contents in a document in order to achieve illicit purposes. At times, this may result in financial and mental loss to an individual or an organization. Hence, ink analysis is necessary to identify such an alteration. Convolution Neural Network (CNN) can be used to identify such cases of alteration, as CNN has emerged as a monumental success in the field of computer vision for varieties of classification tasks. But, CNN requires large amount of labeled data for training. Hence, there is a need to generate a large dataset for the experiments relating to handwritten word alteration detection. Collection, digitization, and cropping of a large number of altered and unaltered handwritten words are tedious and time consuming. To overcome such an issue, an approach for synthetic word data generation is presented in this paper for handwritten word alteration detection experiments. This scheme is designed in such a way that the synthetically generated words are very similar to the original ones. In order to achieve this, handwritten character data set is prepared using 10 blue and 10 black pens. These handwritten characters are used for creating synthetic word alteration data set. The presented approach uses relatively less number of handwritten character images to create a huge word alteration data set. Further, deep learning models are trained on the synthetically generated data set for word alteration detection.
【 授权许可】
Unknown