Automated transcription of historical encrypted manuscripts

Eugen Antal, Pavol Marák

Abstract


This paper deals with historical encrypted manuscripts and intro-
duces an automated method for detection and transcription of ciphertext symbols
for the subsequent cryptanalysis. Our database contains documents used in the
past by aristocratic families living in the territory of Slovakia. They are encrypted
using a nomenclator which is a specific type of substitution cipher. In our case,
nomenclator uses digits as ciphertext symbols. We have proposed a method for
detection, classification and transcription of handwritten digits from the original
documents. Our method is based on Mask R-CNN which is a deep convolutional
neural network for instance segmentation. Mask R-CNN was trained on manually
collected database of digit annotations. We employ a specific strategy where the
input image is first divided into small blocks. The image blocks are then passed
to Mask R-CNN to obtain detections. This way we avoid problems related to
detection of large number of small dense objects in a high resolution image. Ex-
periments have shown promising detection performance for all digit types with
minimum false detections.


Full Text:

 Subscribers Only


DOI: https://doi.org/10.2478/tmmp-2022-0019