Amandus Krantz

Cognitive Robotics

Cluster-Based Sample Selection for Document Image Binarization


Journal article


Amandus Krantz, Florian Westphal
2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), 2019

Semantic Scholar DBLP DOI
Cite

Cite

APA   Click to copy
Krantz, A., & Westphal, F. (2019). Cluster-Based Sample Selection for Document Image Binarization. 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).


Chicago/Turabian   Click to copy
Krantz, Amandus, and Florian Westphal. “Cluster-Based Sample Selection for Document Image Binarization.” 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) (2019).


MLA   Click to copy
Krantz, Amandus, and Florian Westphal. “Cluster-Based Sample Selection for Document Image Binarization.” 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), 2019.


BibTeX   Click to copy

@article{amandus2019a,
  title = {Cluster-Based Sample Selection for Document Image Binarization},
  year = {2019},
  journal = {2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)},
  author = {Krantz, Amandus and Westphal, Florian}
}

Abstract

The current state-of-the-art, in terms of performance, for solving document image binarization is training artificial neural networks on pre-labelled ground truth data. As such, it faces the same issues as other, more conventional, classification problems; requiring a large amount of training data. However, unlike those conventional classification problems, document image binarization involves having to either manually craft or estimate the binarized ground truth data, which can be error-prone and time-consuming. This is where sample selection, the act of selecting training samples based on some method or metric, might help. By reducing the size of the training dataset in such a way that the binarization performance is not impacted, the required time spent creating the ground truth is also reduced. This paper proposes a cluster-based sample selection method that uses image similarity metrics and the relative neighbourhood graph to reduce the underlying redundancy of the dataset. The method, implemented with affinity propagation and the structural similarity index, reduces the training dataset on average by 49.57% while reducing the binarization performance only by 0.55%.


Share


Follow this website


You need to create an Owlstown account to follow this website.


Sign up

Already an Owlstown member?

Log in