Digital Resurrection: Restoring Fragile Documents with OCR

[{“box”:0,”content”:”[if 992 equals=”Open Access”]n

n

n

n

Open Access

nn

n

n[/if 992]n

n

Year : August 17, 2024 at 11:02 am | [if 1553 equals=””] Volume :14 [else] Volume :14[/if 1553] | [if 424 equals=”Regular Issue”]Issue[/if 424][if 424 equals=”Special Issue”]Special Issue[/if 424] [if 424 equals=”Conference”][/if 424] : 02 | Page : 29-35

n

n

n

n

n

n

By

n

[foreach 286]n

n

n

Aniket Rawat, Shivam Kudal, Akshay Pawar, Chirag Fulfagar, Akshay Pawar, Shalaka Deore,

n

    n t

  • n

n

n[/foreach]

n

n[if 2099 not_equal=”Yes”]n

    [foreach 286] [if 1175 not_equal=””]n t

  1. Student, Department of Computer Engineering,, Student, Department of Computer Engineering,, Student, Department of Computer Engineering,, Student, Department of Computer Engineering,, Student, Department of Computer Engineering,, Student, Department of Computer Engineering, MES Wadia College of Engineering, S.P. Pune University., MES Wadia College of Engineering, S.P. Pune University., MES Wadia College of Engineering, S.P. Pune University., MES Wadia College of Engineering, S.P. Pune University., MES Wadia College of Engineering, S.P. Pune University., MES Wadia College of Engineering, S.P. Pune University. Pune, Maharashtra,, Pune, Maharashtra,, Pune, Maharashtra,, Pune, Maharashtra,, Pune, Maharashtra,, Pune, Maharashtra, India, India, India, India, India, India
  2. n[/if 1175][/foreach]

n[/if 2099][if 2099 equals=”Yes”][/if 2099]n

n

Abstract

nIn creating a typical Optical Character Recognition (OCR) system, several steps are involved, such as preprocessing, segmentation, feature extraction, and classification. Preprocessing, which is a particularly interesting and challenging aspect of Document Analysis and Recognition (DAR), deals with converting scanned or photographed images containing machine-printed or handwritten text, including numbers, letters, and symbols, into a format that the system can understand. Segmentation is a crucial task in any OCR system, as it breaks down image text documents into lines, words, and characters. The accuracy of the OCR system heavily relies on the segmentation algorithm used. To handle significant degradation like cuts, blobs, merges, and vandalism, Google Cloud Vision is utilized to capture contextual relationships within the document. Moreover, the method seamlessly combines document restoration and super-resolution, making the process efficient and producing high-quality results directly from degraded documents. Through extensive testing on various document sources like magazines and books, significant improvements in image quality have been demonstrated. The approach is robust and adaptable, particularly excelling with severely degraded documents like books, making it an ideal solution for digital libraries and similar repositories aiming to preserve and enhance document collections.

n

n

n

Keywords: OCR, Google Cloud Vision, DAR, Feature Extraction, Digital Libraries

n[if 424 equals=”Regular Issue”][This article belongs to Trends in Opto-electro & Optical Communication(toeoc)]

n

[/if 424][if 424 equals=”Special Issue”][This article belongs to Special Issue under section in Trends in Opto-electro & Optical Communication(toeoc)][/if 424][if 424 equals=”Conference”]This article belongs to Conference [/if 424]

n

n

n

How to cite this article: Aniket Rawat, Shivam Kudal, Akshay Pawar, Chirag Fulfagar, Akshay Pawar, Shalaka Deore. Digital Resurrection: Restoring Fragile Documents with OCR. Trends in Opto-electro & Optical Communication. August 17, 2024; 14(02):29-35.

n

How to cite this URL: Aniket Rawat, Shivam Kudal, Akshay Pawar, Chirag Fulfagar, Akshay Pawar, Shalaka Deore. Digital Resurrection: Restoring Fragile Documents with OCR. Trends in Opto-electro & Optical Communication. August 17, 2024; 14(02):29-35. Available from: https://journals.stmjournals.com/toeoc/article=August 17, 2024/view=0

nn[if 992 equals=”Open Access”] Full Text PDF Download[/if 992] n

n[if 992 not_equal=’Open Access’] [/if 992]nn

n

nn[if 379 not_equal=””]n

Browse Figures

n

n

[foreach 379]n

n[/foreach]n

n

n

n[/if 379]n

n

References

n[if 1104 equals=””]n

  1. Madake J, Pandey S. Tabular Data Extraction From Documents. InProceedings of International Conference on Recent Trends in Computing: ICRTC 2022 2023 Mar 21 (pp. 429-439). Singapore: Springer Nature Singapore.
  2.  Vaithiyanathan D, Muniraj M. Cloud based text extraction using google cloud vison for visually impaired applications. In 2019 11th international conference on advanced computing (ICoAC) 2019 Dec 18 (pp. 90-96). IEEE.
  3. Lavalas J, Kordas M, Summerscales R. Optical Character Recognition (OCR) Approaches to Cursive Handwriting Transcription: Lessons from the Blythe Owen Letters Project. Journal of Adventist Archives. 2022;2:53.
  4. Keshri P, Kumar P, Ghosh R. Rnn based online handwritten word recognition in devanagari script. In2 018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) 2018 Aug 5 (pp. 517-522). IEEE.
  5. He S, Schomaker L. DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern recognition. 2019 Jul 1;91:379-90.
  6. Wadhwani M, Kundu D, Chakraborty D, Chanda B. Text extraction and restoration of old handwritten documents. Digital Techniques for Heritage Presentation and Preservation. 2021: 109-32.
  7. Assael Y, Sommerschield T, Prag J. Restoring ancient text using deep learning: a case study on Greek epigraphy. arXiv preprint arXiv:1910.06262. 2019 Oct 14.
  8. Soumya A, Kumar GH. Enhancement and segmentation of historical records. ACITY, DPPR, VLSI, WiMNET, AIAA, CNDC. 2015:95-113.
  9. Kaur R, Sharma DV. Punjabi text recognition system for portable devices: A comparative performance analysis of cloud vision API with Tesseract. Journal of Computer Science and Engineering (JCSE). 2021 Aug 10;2(2):104-11.
  10. Kulkarni I, Tikkal S, Chaware S, Kharate P, Pandit A. Proposed Design to Recognize Ancient Sanskrit Manuscripts with Translation Using Machine Learning. InProceedings of the International Conference on Innovative Computing & Communication (ICICC) 2022 Feb 2.

nn[/if 1104][if 1104 not_equal=””]n

    [foreach 1102]n t

  1. [if 1106 equals=””], [/if 1106][if 1106 not_equal=””],[/if 1106]
  2. n[/foreach]

n[/if 1104]

nn


nn[if 1114 equals=”Yes”]n

n[/if 1114]

n

n

[if 424 not_equal=””]Regular Issue[else]Published[/if 424] Subscription Review Article

n

n

[if 2146 equals=”Yes”][/if 2146][if 2146 not_equal=”Yes”][/if 2146]n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n[if 1748 not_equal=””]

[else]

[/if 1748]n

n

n

Volume 14
[if 424 equals=”Regular Issue”]Issue[/if 424][if 424 equals=”Special Issue”]Special Issue[/if 424] [if 424 equals=”Conference”][/if 424] 02
Received July 24, 2024
Accepted July 31, 2024
Published August 17, 2024

n

n

n

n

n

n nfunction myFunction2() {nvar x = document.getElementById(“browsefigure”);nif (x.style.display === “block”) {nx.style.display = “none”;n}nelse { x.style.display = “Block”; }n}ndocument.querySelector(“.prevBtn”).addEventListener(“click”, () => {nchangeSlides(-1);n});ndocument.querySelector(“.nextBtn”).addEventListener(“click”, () => {nchangeSlides(1);n});nvar slideIndex = 1;nshowSlides(slideIndex);nfunction changeSlides(n) {nshowSlides((slideIndex += n));n}nfunction currentSlide(n) {nshowSlides((slideIndex = n));n}nfunction showSlides(n) {nvar i;nvar slides = document.getElementsByClassName(“Slide”);nvar dots = document.getElementsByClassName(“Navdot”);nif (n > slides.length) { slideIndex = 1; }nif (n (item.style.display = “none”));nArray.from(dots).forEach(nitem => (item.className = item.className.replace(” selected”, “”))n);nslides[slideIndex – 1].style.display = “block”;ndots[slideIndex – 1].className += ” selected”;n}n”}]

Check Our other Platform for Workshops in the field of AI, Biotechnology & Nanotechnology.
Check Out Platform for Webinars in the field of AI, Biotech. & Nanotech.