Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?
J. Puigcerver
Current state-of-the-art approaches to offline Handwritten Text Recognition extensively rely on Multidimensional Long Short-Term Memory networks. However, these architectures come with quite an expensive computational cost, and we observe that they extract features visually similar to those of convolutional layers, which are computationally cheaper. This suggests that the two-dimensional long-term dependencies, which are potentially modeled by multidimensional recurrent layers, may not be essential to achieve a good recognition accuracy, at least in the lower layers of the architecture. In this work, an alternative model is explored that relies only on convolutional and one-dimensional recurrent layers that achieves better or equivalent results than those of the current state-of-the-art architecture, and runs significantly faster. In addition, we observe that using random distortions during training as synthetic data augmentation dramatically improves the accuracy of our model. Thus, are multidimensional recurrent layers really necessary for Handwritten Text Recognition? Probably not.
ICDAR 2017 (Oral)
A.H. Toselli, J. Puigcerver, and E. Vidal
Two methods are presented to improve word confidence scores for Line-Level Query-by-String Lexicon-Free Keyword Spotting (KWS) in handwritten text images. The first one approaches true relevance probabilities by means of computations directly carried out on character lattices obtained from the lines images considered. The second method uses the same character lattices, but it obtains relevance scores by first computing frame-level character sequence scores which resemble the word posteriorgrams used in previous approaches for lexicon-based KWS. The first method results from a formal probabilistic derivation, which allow us to better understand and further develop the underlying ideas. The second one is less formal but, according with experiments presented in the paper, it obtains almost identical results with much lower computational cost. Moreover, in contrast with the first method, the second one allows to directly obtain accurate bounding boxes for the spotted words.
ICFHR 2016 (Poster)
I. Pratikakis, K. Zagoris, B. Gatos, J. Puigcerver, A.H. Toselli, and E. Vidal
The H-KWS 2016, organized in the context of the ICFHR 2016 conference aims at setting up an evaluation framework for benchmarking handwritten keyword spotting (KWS) examining both the Query by Example (QbE) and the Query by String (QbS) approaches. Both KWS approaches were hosted into two different tracks, which in turn were split into two distinct challenges, namely, a segmentation-based and a segmentation-free to accommodate different perspectives adopted by researchers in the KWS field. In addition, the competition aims to evaluate the submitted training-based methods under different amounts of training data. Four participants submitted at least one solution to one of the challenges, according to the capabilities and/or restrictions of their systems. The data used in the competition consisted of historical German and English documents with their own characteristics and complexities. This paper presents the details of the competition, including the data, evaluation metrics and results of the best run of each participating methods.
ICFHR 2016 (Competition)
J. Puigcerver, A.H. Toselli, and E. Vidal
Lexicon-based handwritten text keyword spotting (KWS) has proven to be a faster and more accurate alternative to lexicon-free methods. Nevertheless, since lexicon-based KWS relies on a predefined vocabulary, fixed in the training phase, it does not support queries involving out-of-vocabulary (OOV) keywords. In this paper, we outline previous work aimed at solving this problem and present a new approach based on smoothing the (null) scores of OOV keywords by means of the information provided by "similar" in-vocabulary words. Good results achieved using this approach are compared with previously published alternatives on different data sets.
Neural Computing and Applications (NCAA), 2016
J. Puigcerver, A.H. Toselli, and E. Vidal
Traditionally, the HMM-Filler approach has been widely used in the fields of speech recognition and handwritten text recognition to tackle lexicon-free, query-by-string keyword spotting (KWS). It computes a score to determine whether a given keyword is written in a certain image region. It is conjectured, that this score is related to the confidence of the system, respect to the previous question. However, it is still not clear what this relationship is. In this paper, the HMM-Filler score is derived from a probabilistic formulation of KWS, which gives a better understanding of its behavior and limits. Additionally, the same probabilistic framework is used to present a new algorithm to compute the KWS scores, which results in better average precision (AP), for a keyword spotting task in the widely used IAM database. We show that the new algorithm can improve the HMM-filler results up to 10.4% relative (5.3% absolute) points in AP, in the considered task.
ICDAR 2015 (Oral)
A.H. Toselli, J. Puigcerver, and E. Vidal
The so-called filler or garbage Hidden Markov Models (HMM-Filler) are among the most widely used models for lexicon-free, query by string key word spotting in the fields of speech recognition and (lately) handwritten text recognition. This approach has important drawbacks. First, the keyword-specific HMM Viterbi decoding process needed to obtain the confidence scores of each spotted word involves a large computational cost. Second, in its traditional conception, the "filler" does not take into account any context information. And in case it does, even though the involved greater computational cost, the required keyword-specific language model building can become quite intricate. This paper presents novel keyword spotting results by using a character lattice based KWS approach with context information provided by employing high order N-gram models. This approach has proved to be faster than the traditional HMM-Filler approach, w here required confidence scores are computed directly from character lattices produced during a single Viterbi decoding process using N-gram models. Experiments show that, as compared with the HMM-filler approach using 2-gram model, the character lattice based method requires between one and two orders of magnitude less query computing time.
ICDAR 2015 (Poster)
E. Vidal, A.H. Toselli, and J. Puigcerver
Keyword Spotting (KWS) has been traditionally considered under two distinct frameworks: Query-by-Example (QbE) and Query-by-String (QbS). In both cases, the user of the system wished to find occurrences of a particular keyword in a collection of document images. The difference is that, in QbE the keyword is given as an exemplar image while, in the case of QbS, the keyword is given as a text string. In several works, the QbS scenario has been approached using QbE techniques; but the converse has not been studied in depth yet, despite of the fact that QbS systems typically achieve higher accuracy. In the present work, we present a very effective probabilistic approach for QbE KWS, based on highly accurate QbS KWS techniques. To assess the effectiveness of this approach, we tackle the segmentation-free QbE task of the ICFHR-2014 Competition on Handwritten KWS. Our approach achieves a me an average precision (mAP) as high as 0.715, which improves by more than 70% the best mAP achieved in this competition (0.419 under the same experimental conditions).
ICDAR 2015 (Poster)
J. Puigcerver, A.H. Toselli, and E. Vidal
The principal goal of the Competition on Keyword Spotting for Handwritten Documents was to promote different approaches used in the field of Keyword Spotting and to fairly compare them using uniform data and metrics. To accommodate different perspectives adopted by researches in this field, the competition was divided into two distinct tracks, namely, a training-free and a training-based track, and each track entailed two optional assignments. Six participants submitted solutions to one or both assignments, depending on the capabilities and/or restrictions of their systems. The data used in the competition consisted of historical documents in English with different levels of complexity. This paper presents the details of the competition, including the data, evaluation metrics and results of the best participant methods.
ICDAR 2015 (Competition)
J. Puigcerver, A.H. Toselli, and E. Vidal
Lexicon-based handwritten text keyword spotting (KWS) has proven to be a very fast and accurate alternative to lexicon-free methods. Nevertheless, since lexicon-based KWS methods rely on a predefined vocabulary, fixed in the training phase, they perform poorly for any query keyword that was not included in it (i.e. out-of-vocabulary keywords). This turns the KWS system useless for that particular type of queries. In this paper, we present a new way of smoothing the scores of OOV keywords, and we compare it with previously published alternatives on different data sets.
IbPRIA 2015 (Oral)
J. Puigcerver, A.H. Toselli, and E. Vidal
We present a handwritten text Keyword Spotting (KWS) approach based on the combination of KWS methods using word-graphs (WGs) and character-lattices (CLs). It aims to solve the problem that WG-based models present for out of vocabulary (OOV) keywords: since there is no available information about them in the lexicon or the language model, null scores are assigned. OOV keywords may have a significant impact on the global performance of KWS systems, as we show. By using a CL approach, which does not suffer from the previous problem, to estimate the OOV scores, we take advantage of both models, using the speed and accuracy that WGs provide for in-vocabulary keywords and the flexibility of the CL approach. This combination improves significantly both average precision and mean average precision over the two methods.
ICFHR 2014 (Poster)
J. Puigcerver, A.H. Toselli, and E. Vidal
Thanks to the use of lexical and syntactic information, Word Graphs (WG) have shown to provide a competitive Precision-Recall performance, along with fast lookup times, in comparison to other techniques used for Key-Word Spotting (KWS) in handwritten text images. However, a problem of WG approaches is that they assign a null score to any keyword that was not part of the training data, i.e. Out-of-Vocabulary (OOV) keywords, whereas other techniques are able to estimate a reasonable score even for these kind of keywords. We present a smoothing technique which estimates the score of an OOV keyword based on the scores of similar keywords. This makes the WG-based KWS as flexible as other techniques with the benefit of having much faster lookup times.
ICPR 2014 (Oral)