(PDF) Writer identification using machin

Multimedia Tools and Applications (2019) 78:10889–10931 https://doi.org/10.1007/s11042-018-6577-1 Writer identification using machine learning approaches: a comprehensive review Arshia Rehman 1 & Saeeda Naz 1 & Muhammad Imran Razzak 2 Received: 1 February 2018 / Revised: 16 August 2018 / Accepted: 20 August 2018 / Published online: 17 September 2018 # Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract Handwriting is one of the most common types of questioned writing encountered and frequently attracts the attention in litigation. Contrary to the physiological characteristics, handwriting is a behavioral characteristic thus no two individuals with mature handwriting are exactly alike or an individual cannot produce the others writing exactly. Writing behavior and individualities are examined for similarities for both specimen and questioned document, thus, it is very efficient and effective strategy for biometrics. In this paper, we present a comprehensive review of writer identification methods and intend to provide taxonomy of dataset, feature extraction methods, as well as classification (conventional and deep learning based) for writer identification. For ease of reader, we grouped the discussion into English, Arabic, Western and Other languages from script prospective, whereas, from algorithm and methods perspective, we grouped the discussion with respect to implementation steps se- quence. In the end, we highlighted the challenges and open research issues in the field of writer identification. Finally, we also suggest future direction. Keywords Writer identification . Multi-script . Features extraction . Deep learning 1 Introduction Handwriting is an art that is developed by birth in nature and can not be imitated. Thus, no two persons can generate exactly the same handwriting and that even an individual cannot exactly * Saeeda Naz

[email protected]

Arshia Rehman

[email protected]

Muhammad Imran Razzak

[email protected]

1 Government Girls Postgraduate College, no.1, Abbottabad, KPK, Pakistan 2 University of Technology Sydney, Sydney, Australia 10890 Multimedia Tools and Applications (2019) 78:10889–10931 reproduce his own handwriting. The late one is call variation that is natural deviation occurred in an individual’s writings. It is a very strong identifying characteristic of a person and plays a significant role for forensic document experts in proving someone’s authenticity. Recently, automatic analysis of handwritten documents has significantly attracted the researchers’ attention, especially in the field of historical document analysis as due to the sheer size of the handwritten specimen, it will take the long time for forensic expert to manually analyze and compare the questioned document with all specimen in the database to find imposter. Thus making an automated system for writer identification could be very useful and reduce the forensics expert efforts by identifying the text written by suspected writer from the large set of document with high confidence. History reveals that each individual has its own writing style that varies from others [166]. From the early childhood age, we start to learn the writing text from the standard “the copy book” as shown in Fig. 1. The ones writing diverges according to the situation, geographical location, traditional and historical backgrounds. However, with the passage of time, handwrit- ing of individual starts change from the learned copy-book style. The writing style is unique for every person. It is impossible or rare to have the same writing style of two individuals, even an individual cannot produce exactly the same writing the one he did before. This variation in handwriting patterns due to individual writing style is known as inter-class variation [4]. These characteristics serve to discriminate an individuals writing from another’s and reveal the great interest in the research of writer identification. With the emergence of information security, handwriting is served as a trait of biometric and being used to authenticate, validate, verify and identify an individual on the basis of behavioral physiognomies. It is cheapest way to obtain for identification and carries significant importance for authentication and authorship of questioned document, identify forgeries, detect alterations, verify legal documents, signatures and cheques or analyze indented writings. Different handwriting bio-metrics is used in wider application areas as depicted in Fig. 2. The purpose of writer identification is to determine the genuine writer from a list of registered candidates according to the similarity between their hand writings. Thus, people who are working in the humanities can use writer identification methods to analyze their handwritten text to determine the writer of a specific document. From the view point of graphology, handwriting is an insightful means of personality profiling, highlighting the character traits, tracking the feelings and emotions of a person. Therefore handwriting is also known as brain writing because the manipulation of writing tool is formed by the order of brain that sends to nervous system, hand, arm and fingers [15]. Thus, neurological brain pattern represents the personality traits [174]. Fig. 1 Copybook Style of United State. Available at: http://www.handwriting.org/united-states.html Multimedia Tools and Applications (2019) 78:10889–10931 10891 Fig. 2 Handwriting bio-metrics types Not only writer identification and personality analysis, it plays vital role in different walks of life among them one of its interesting relationship is with neuro-science and neurological disorders in patients reflected from handwriting [58, 181]. As an example, Parkinson’s disease detection and diagnosis from handwriting [181], where the effects of the disorder are manifested in the writing of the patient during early stages. The writing of the patient tends to get smaller and smaller and at the end the letters might even be unreadable micrographia [57]. Moving towards the world of forensic analysis, handwriting is used for authentication and authorship of questioned document, identify forgeries, detect alterations, verify legal docu- ments, signatures and cheques or analyze indented writings. The oldest mechanism of forensic examiner is tiresome, thus developing computerized systems for handwriting analysis could serve as valuable tools in forensic document analysis. Summarizing, we could say that handwriting carries remarkable evidence about writing style itself, the writer of handwriting and demographic information like gender, age, nationality and handiness. With the advancement in information technology, computerized analysis of handwriting is extensively used in various applications of world since last three decades [55]. The problem of writer recognition and handwriting recognition are quite similar and related with each other [137, 175]. The objective of this paper is to present overview of various techniques on writer identification. Section 2 provides the detailed background of writer identification and verification, section 3 explains the dataset on the subject of writer identification. Then section 4 presents the literature review in related to steps of writer identification pipeline, reported for handwritten and online systems. The section 3 and section 4 are starting to give information about English then proceed by another widely language Arabic and then other languages like French, Dutch, and Chinese etc. The overview of the paper is graphically represented in Fig. 3. 10892 Multimedia Tools and Applications (2019) 78:10889–10931 Fig. 3 Overview of the paper 2 Background Before elaborating writer identification, it is essential to know about writer recognition. Writer recognition is a branch of behavioral bio-metrics that authenticates individual using handwrit- ing. It includes writer identification and writer verification. The Writer identification is the process of finding the genuine writer from a list of other registered candidates based on the similarity between their handwriting illustrated in Fig. 4.a. The Writer verification is the process of comparing test handwriting sample with pre-stored known source samples for authentication [32]. The procedure of writer verification is represented in Fig. 4.b. The Writer verification is a two-fold or binary classification problem that involves the decision of accept/ reject however, writer identification is a multinomial classification problem and hence it is considered more challenging [75]. Literature divulges that writer identification and verification are taken as the central and a core pillar of studying the handwriting variations for bio-metric purposes [174]. Side by side, new term Writer Retrieval is also emerged. The process of Fig. 4 Differentiate among writer recognition systems like identification, verification and retrieval systems Multimedia Tools and Applications (2019) 78:10889–10931 10893 finding all relevant documents of a specific writer is termed as writer retrieval [50]. The Fig. 4.c depicts the frame work of writer retrieval system. On the basis of acquisition of writing samples, writer identification are generally catego- rized into two categories: online and offline. Online writing identification is also known as dynamic method. In online method, handwriting samples are taken through tablets, PDAs, magnetic pad, smart phones and touch screens etc. Writing samples are kept as trajectories embodied as time series of two dimensional coordinates. Dynamic features are calculated and used for identification. Different parameters like writing speed, direction of writing, positions of pen tips, velocity, angles and pressure etc. are extracted. Online handwriting contains sequential and spatial information. These features result spatiotemporal parameter space exemplification of handwriting. On the other hand, offline writing identification is also known as static method. Writing samples images are scanned from paper, image or document using scanner. This method based on spatial attributes like words, paragraphs, characters and lines [101]. Due to the deficiency of sequential information in handwriting and large intra class variation, offline writer recognition is considered as a harder task. On the basis of content of writing, offline writer identification is categories into two methods: text-independent and text-dependent. Text-independent writer identification deals with image of arbitrary texts that does not depend on fixed text content. On the other hand, the methods for text-dependent necessitate input image with static (fixed) text contents and compares the input with registered templates for identification [189]. It is also known as script based or content based identification. In general, text-dependent methods operate at the character or word level whereas text-independent methods work on the line or paragraph levels. Writer identification is relatively essential these days with exponential increase in technol- ogy and no person deny its applications in range of areas which include, forensic analysis, historic files and ancient manuscripts and so on. One can develop an authentication system by the amalgamation of writer identification and verification which can be used to monitor and regulate the access to certain confidential sites or data in which massive amounts of documents, forms, notes and meeting minutes are constantly being processed and managed. This system is valuable as it contain huge knowledge about the identity of the writer additionally. Furthermore, it can also be used for historical document analysis [68], handwriting recognition system enhancement [144] and hand held and mobile devices [147]. Summarizing the applications of writer identification, we can say that its recent development and performance are considered as a strong tool for physiologic modalities of identification, such as DNA and fingerprints [166]. Writer identification framework has several phases but firstly it is decided that approach is online, offline, text dependent or text independent. Phases involve in the framework of offline text independent writer identification encompasses: data acquisition, pre-processing, feature extraction and classification or identification. Figure 5 depicts the main phases of any writer identification system. 3 Datasets Data-set is the cornerstone of any research work. The availability of dataset is one of the essential prerequisite for development and evaluation in any research domain and same is the case with handwriting and writer recognition. In the last few years, different databases for character recognition, word spotting and writer identification have been published in the literature. We are elaborating numbers of dataset according to the languages in the coming sections. 10894 Multimedia Tools and Applications (2019) 78:10889–10931 Fig. 5 Writer identification framework 3.1 English datasets English is a standard and old language that is spoken by 1.5 billion of people in the world that is 20% population of the World. It is reported that first language of 360 million people is English. A lot of significant work has been published for handwritten text for different problems like OCR, writer identification and recognition, handwriting analysis, parkinson disease prediction etc. The renowned datasets of the English based scripts for writer identifi- cation are deliberated in the coming sections. 3.1.1 CEDAR The CEDAR (Center of Excellence for Document Analysis and Recognition) [42, 166] developed number of datasets at University of Buffalo.1 First and larger database used for writer identification and handwritten verification was CEDAR-Letter that contains gray scale and binary images of text by 1000 writers. 3.1.2 IAM IAM [121] is extensively used handwritten data set for writer recognition developed by University of Bern.2 The data-set initially included 1066 forms generated by 400 different writers, and then extended to include 1539 forms produced by 657 different writers The data set contains detailed information about the writer identity, the ground truth text and the segmentation at the line, sentence and word levels. It contains 13,353 labeled text lines of variable content with approximate of 14 text lines per writer [79]. 60% of the text lines were used as the reference base and 40% used for testing the performance correspondingly. Abundant researches in writer identification and verification was performed using IAM data-set [11, 26, 27, 32, 35, 74, 79, 92, 145, 146, 149, 150, 160–162]. 1 http://www.buffalo.edu/ 2 http://www.inf.unibe.ch/ Multimedia Tools and Applications (2019) 78:10889–10931 10895 3.2 Arabic datasets Arabic is a central sematic language that is sixth most spoken language in the World. 420 million people of the World speak Arabic. Due to the complexity of the Arabic script, researchers paid great attention towards Arabic script for the image processing, pattern recognition and document analysis tasks. Different datasets in Arabic script deployed for handwritten recognition and writer recognition are elaborated in the succeeding sections. 3.2.1 KHATT KHATT [18, 116] developed by research groups from KFUPM Saudi Arabia,3 TU Braun- schweig, Germany and TUD ortmund, Germany.4 It consists of Arabic handwritten documents from 1000 writers. Each writer wrote 6 paragraphs with the approximate of 2000 random and fixed paragraphs and free paragraphs. 3.2.2 AHDB AHDB stands for Arabic Handwritten Database [10] is widely used in hand- written text recognition and writer recognition of Arabic script. The data-set comprises of most popular written Arabic words and text of 105 writers. The data-set contains approximate of 10,000 words for Arabic cheque processing. 3.2.3 IFN/ENIT Another data-set used for handwriting recognition as well as writer identification is IFN/ENIT dataset [59, 134] developed by the Institute of Communications Technology (IFN)5 and the Ecole Nationale dIngenieurs de Tunis (ENIT).6 Data-set consists of 937 names of towns and cities filled by 411 writers. It formed 26,459 images and more than 210,000 characters. Each writer filled 5 pages, and each page had 12 city names. Each name is coded with the ground truth information of style, sequence of character shapes and the baseline. This data-set is extensively used by more than 100 research groups from more than 30 countries for Arabic handwritten text recognition and in several global competitions [60, 117–120]. It is also employed in writer identification of Arabic text in [3, 38, 79]. 3.2.4 Al Isra Al Isra [104] was developed by 500 writers. It contains 37,000 words, 10,000 digits, 2500 signatures, 500 sentences that are most popular in Arabic. 3 http://www.kfupm.edu.sa/default.aspx 4 https://www.tu-braunschw 5 https://www.ifn.ing.tu-bs.de/en/ifn/ 6 http://www.enit.rnu.tn/ 10896 Multimedia Tools and Applications (2019) 78:10889–10931 3.2.5 MADCAT Multilingual Automatic Document Classification Analysis and Translation (MADCAT) [169] is a five year program of US Defense Advanced Research Projects Agency (DARPA) that releases MADCAT Phase 1 Training corpus, an Arabic dataset available publicly. This dataset comprises 9693 pages written by the amalgamation of around 400 writers. 3.3 Western languages dataset There are some datasets in western languages like French, Dutch etc. used for handwritten recognition and writer identification. Few of them are: 3.3.1 CDER-CRDROM2 CEDAR have three main dataset one of them is CDER-CRDROM7 database that contains machine-printed Japanese character images. 3.3.2 Firemaker dataset Firemaker dataset [154] is a Dutch script data-set used for writer identification. 252 Dutch student’s handwritings in 1008 scanned pages were acquired to develop the dataset. Each student write 4 pages in such a way first page cover 5 paragraphs with normal handwriting. Second page consist of two paragraphs using uppercase text, third page contains unnatural and forged handwriting and the fourth page contains description about a given cartoon that writer discuss in written form using his own words. Thus we can say generally that first and fourth pages are used for writer identification. 3.3.3 RIMES dataset Another relatively different dataset in writer recognition is RIMES (Recon- naissance et Indexation de donnes Manuscrites et de fac similes) [76]. It is a French script data- set that comprises handwritten letters in French text representing the mails sent by people to companies or administrations. The data was collected by more than 1300 writers that wrote 5 letters, making a total of 5600 letters in more than 12,000 pages with annotated, as well as, secondary databases of characters, handwritten words of 300,000 snippets and logos. 3.4 Multilingual datasets Multilingual means more than one language. In the domain of writer identification there are several dataset that contains more than one language that provide reliability, affectivity and consistency to true the hypothesis that the documents written in different scripts have same author. We are presenting statistic of dataset for writer identification. 7 http://www.cedar.buffalo.edu/Databases/JOCR/. Multimedia Tools and Applications (2019) 78:10889–10931 10897 3.4.1 ICDAR2013 In ICDAR2013 [82], data-set used for competition, is collected from 475 writers on 4 handwritten documents in both English and Arabic script as similar to QUWI dataset. First and second page had Arabic handwritten text while third and fourth contains English samples. 3.4.2 CVL CVL [105] consist of both English and German (1 in German and 6 in English) handwritten text of 311 different writers. 7 documents of 27 writers are used for training and 284 writers of 5 documents are used for testing set. 3.4.3 QUWI QUWI [7] consist of handwritten text gathered from 1017 writers in both Arabic and English scripts. Each individual were asked to write 4 pages. Furthermore, data-set contains 4068 digitized pages, approximately 60,000 words written in Arabic for text- independent analysis and more than 100,000 Arabic words for text-dependent analysis and same statistics analysis for English script. The first page contains approximately 6 handwritten lines in the Arabic language. The second page contains an Arabic text of 3 paragraphs with the average of 11 lines. Similarly, the third page contains about 6 handwritten lines in English and the fourth page contains 14 English text lines approx- imately. The first and the third pages are to be used for text-independent writer identi- fication tasks, whereas the second and fourth pages are to be used for text-dependent writer identification tasks. This bilingual dataset had been used in writer recognition researches in [11, 28, 54, 55]. 3.5 Other datasets However there are some datasets that encompasses digits, ZIP codes and musical sheets etc. that can be used to identify the writer. Few of them are: 3.5.1 CEDAR-CDROM1 dataset One of the CEDAR dataset is CEDAR-CDROM111 data-set that comprises alphabetic char- acters, digits, ZIP Codes, and handwritten words. 3.5.2 MNIST MNIST [124] is the numeric data-set that contains 60,000 numbers documents for training and 10,000 for testing written by 250 writers. 3.5.3 CVC-MUSCIMA CVC-MUSCIMA stands for Computer Vision Center-MUsic SCore IMAges [67] is another appealing and different data-set that consist of music scores that serves for the identification of musicians. It encompasses 1000 music sheets written by 50 different musicians per 20 pages. 10898 Multimedia Tools and Applications (2019) 78:10889–10931 Each document has 1000 original images and 11,000 distorted images. Table 1 summarizes the databases for handwriting analysis and writer identification tasks for different languages. After having introduced with the data sets, we will now move on to the frame- work of writer identification along with pre-processing, feature extraction and classifications in the following sections. 4 State of art The work started back to appear [106] on handwriting analysis, writer identification and verification with the turn of the year. The subject has gauge the interest of researchers in the field of pattern recognition that can prove from the fact of dedication and publication of number of thesis [29, 40, 160] in this domain. Then reported significant related surveys. Finally, this section classify the works related to steps of writer identification pipeline, reported for handwritten and online systems. In 1989, a pioneer review presented in [136] to address the static and dynamic techniques of signature verification, handwritten recognition and writer identification comprehensively. Very limited articles were published on that time and this was very concise survey for these problems. Plamondo et al. [137] presented a comprehensive review on handwritten recognition based on online and offline methods in 2000. In 2007, Schomaker [152] explained widely the nature of handwriting along with the texture and allographic features in detail with the summarized results. Another review paper was presented in 2009 by Bin-Abdl and Hashim in [29]. Sreeraj and Idicula [95] presented a review paper for writer identification in different languages like English, Arabic, Chinese, Persian etc. along with the depiction of features. One limitation in this paper was found that the renowned features like SIFT, HOG, SURF and CNN based features etc. was not discussed. In 2012, Awaida [20] presented the state of art of writer identification and verification of Arabic script. This paper comprehensively explained databases with research groups, feature extraction techniques and different classification approaches. Only minimum distance classifiers and statistical classifiers were discussed. Another survey paper published in [6] by Ahmed and Sulong that covered the characteristics of Arabic writing, datasets, local and global features used in literature of writer identification. The above surveys summarized the state of the art till 2014 in the field of writer identifi- cation. We are going to update partially the writer identification according to pipe-line of its stages. We explain different feature extraction techniques along with deep learning based features and classification approaches thoroughly since 2014 in coming sections. 4.1 Preprocessing Preprocessing is the data cleaning stage in which irrelevant information is removed from the data. In this phase, binarization, normalization and noise removal are applied on handwritten samples using image processing techniques. Furthermore, segmentation is also performed according to the research domain problem at letter, word, and sentence or paragraph level. Different preprocessing techniques were applied by the researchers in the territory of writer identification using English script datasets. Said et al. [141, 142] generated uniform blocks of text in preprocessing using word de-skewing, text padding and lines or words distance. Multimedia Tools and Applications (2019) 78:10889–10931 Table 1 Writer identification datasets Database Language Writers Description Availability CEDAR-Letter [41, 166] English 1000 Concise database of 156 words Public IAM [122] English 400 1066 documents, 9285 text lines, 82,227 words Public MIAM [33, 122] English 657 Extended of IAM dataset. 1539 document pages Public 4881 lines, 4,3751 word instances KHATT [117] Arabic 1000 1000 handwritten documents forms, 2000 Public paragraphs (random and fixed),9327 lines Free Paragraphs IFN/ENIT [59, 135] Arabic 411 2200 documents, 26,459 names with 212,211 Public characters, 115,585 connected part of Arabic words AHDB [10] Arabic 105 10,000 words for Arabic cheque processing Public Al Isra [105] Arabic 500 500 sentences, 37,000 words, 10,000 digits, Public 2500 signatures MADCAT Arabic 400 9693 handwritten Pages Public Phase 1 Training corpus [172] AD/MADBase [61] Arabic 700 60,000 digits for training, 10,000 digits for On Request testing WAHD [1] Arabic 302 353 manuscripts, 43,976 pages Public Alamri et al. [13] Arabic 328 46,800 digits, 13,439 numerical strings, 21,426 Public letters, 11,375 words, 1640 special symbols CDER-CRDROM2 Japanese Developed from books, 400 binary images, 180,000 symbolic characters, paid ($1500) Japanese Character Image Database journal, magazine etc. 3300 characters Firemaker dataset [156] Dutch 252 1008 scanned pages Public RIMES [77] French 1300 12,000 pages, 5600 real mails/letters, 300,000 Public snippets and logos ICDAR2013 [82] Arabic 475 1900 documents On Request English CVL- Database German, English 311 311 documents, 7 handwritten texts, 1 in Public German and 6 in English. 1,01,069 words QUWI [13] Arabic 1017 5085 documents, 4068 digitized pages, 60,000 On request English Arabic words and 100,000 English words for 10899 text in- dependent analysis 10900 Table 1 (continued) Database Language Writers Description Availability CEDAR- CDROM1 English developed from city, state 184,68 images, 5632 city words, 4938 state Proprietary ($950 CD-1) words and zip codes words, 9454 ZIP codes, 27,837 mixed alphabets and numeric segmented from ad- dress blocks 21,179 digits segmented from ZIP Codes MNIST [125] Numbers 250 60,000 number documents for training, 10,000 Public number documents for testing CVC- MUSCTMA [67] Musical notes 50 1000 music sheets, 1000 original images, Public 11000distorted images KHATT KFUPM Handwritten Arabic Text; CVL Computer Vision Laboratory; AHDB Arabic Handwritten Data Base; QUWI Qatar University Writer Identificationdataset; IFN/ENIT Institute of Communications Technology/Ecole Nationale dIngenieurs de Tunis; ICDAR International Conference on Document Analysis and Recognition; CEDAR Center of Excellence for Document Analysis and Recognition; IAM Institut für Informatik und angewandte Mathematik Multimedia Tools and Applications (2019) 78:10889–10931 Multimedia Tools and Applications (2019) 78:10889–10931 10901 Bensefia et al. [26] used segmentation approach. They firstly extracted connected compo- nents from images to remove irrelevant details. Then the words were characterized. Siddiqi and Vincent [161] applied global thresholding for binarization of images. Handwriting samples were divided into sub-images by employing horizontal and vertical window positioning. Schlapbach and Bunke in [149] performed normalization, vertical scaling and slant correction operations in pre-processing stage. Pandey and Seeja [133] imposed the Otsu method for binary image conversion from gray scale images and removal of undesired text from images. They thinned the images using Zhang-Suen thinning algorithm. Literature reveals that there are many languages like Farsi and Bengali etc. that was preprocessed by various image processing techniques. Shahabi and Rahmati [156] extracted horizontal projection profiling and then employed low pass Gaussian filter for smoothing. The smoothed profile peak value gave space between text lines. Then vertical projection was computed on the binarized images. This gave space between characters and words. Also padding was applied to remove blank spaces. Sheikh and Khotanlou [158] applied morphological operations like opening and closing for the creation of binary images in pre-processing phase. Adak et al. [4] applied connected component labeling algorithm to label the pages. They removed the noise and non-text components like dots, commas etc. Furthermore, they employed 2D Gaussian filter-based technique for lines and word segmentation. Water reservoir principle-based method was used for segmentation of character level. There are many techniques deployed in multilingual dataset. Fiel and Sablatnig [66] performed binarization using global threshold, text line segmentation using local projec- tion prole, skew of the text line, sliding windows with a step size of 20 pixels in pre- processing phase. Christlein et al. [49] improved performance by normalization using ZCA whitening with the KL-Kernel. In [51], script contours were extracted from binarized images using connected component analysis. They extracted 32 × 32 image patches from random script contours. Wu et al. [188] performed word segmentation using LoG filter and line segmentation using Hough transform in pre-processing. Khan et al. [103] improved the robustness of noise and blur falsification using Discrete Cosine Transform (DCT) coefficients. Ahmed et al. [5] performed binarization, detection of connected components and removal of punctuation marks in pre-processing. Chahi et al. [43] binarized the images using Otsu algorithm and transfor- mation using Kronecker delta function. Xing and Qiao in [189] and [190] used the patch scanning strategies to feed the input image patches of 113,113. They employed data augmentation for the better performance of deep writer. Yang et al. [193] significantly improved the performance with data augmentation techniques and DropStroke. 4.2 Features extraction Feature extraction is the process of the conversion of an input image into vector comprises of numerical values [6]. Features are also called attribute, variable, dimension, descriptors which are much more lower numerical values as compare to original image that reduce the overhead for feeding input features vector to the machine learning models. It also reduces the training time of the model. Best features help in recognition and identification of optimal object. In the domain of writer identification, researchers used different types of features and increase the performance of the model. 10902 Multimedia Tools and Applications (2019) 78:10889–10931 Depending on the online or offline approaches, there are three categories of features, like, Statistical Features, Structural Features, Model based Features/Automatic features.There is a brief review of contribution of researchers for extraction of different types of features in the following subsections. 4.2.1 Statistical features The statistical features are statistical and geometric measurements for classification of relevant information for reducing the gap among difference classes. It is subdivided into global features and local features. The global features describe the global traits of entire image. It represents texture features, contour representations, and shape descriptors in the entire image. Some example of global features are Invariant Moments like Hu, Zerinke, Shape Matrices like perimeter, area, compactness etc., texture matrices like local binary patterns, Histogram Oriented Gradients. The global features work well when there is a single object in an image and there is enough contrast between foreground and back- ground. It is also pron to error due to occlusion and clutter. The Local features describe the key points in the patches of image. It computes robust and salient features from multiple interest point in neighborhood. These features represent the salient shapes, texture and key points in a patch of an image patch. Some examples are scale- invariant feature transform (SIFT), Speed up Robust Feature (SURF), Binary Robust Invariant Scalable Key points (BRISK), MSER, LBP, and FREAK. The grouping of local and global features will increase the accuracy but it also increase the computational time of the system. We now present the present work for local and global statistical features employed for writer identification. Global features that operate on paragraph level in offline handwritten samples are Code- book generation [27, 153, 155], Gabor, Directional features [33, 38, 39], GSCM [141, 142, 173], GGD, Contourlet GGD [85, 87]. In writer identification sometimes text line database is used. In this case text line is an input unit to extract the features for text independent analysis. Some of the line features were grapheme based features [126], connected components enclose regions, lower and upper profiles, Fractal features [93, 122]. Several features were extracted at Word level from handwriting samples using offline approach. Word level features employed for writer identification were Edge based directional features [22, 39], Morphological features [203], GSC, WMR, SC, SCON [179, 197]. There are certain writer identification system that operate on character level. Such systems used individual character to segregate a writer from an- others. These systems either character data-set or segmented the data-set into individual characters. In this scheme, character features were used. Numerous character features for writer identi- fication were presented in the literature. Few of them are Height, Area, and Slant [110], HMM features [149], Directional features [183], Fuzzy directional features and fuzzy learning vector quantization (FLVQ) [23], GSC features [167, 195]. Several types of statistical features were extracted from English dataset. Among them one of the widely used statistical feature is based on Gabor filtering. Gabor filter is defined as the multiplication of a plane wave for 2D Gabor filters with Gaussian function. ! x0 2 þ γ2 y0 2 0 x g ðx; y; λ; θ; ψ; σ; γÞ ¼ exp − sin 2π þ ψ ð1Þ 2σ2 λ Multimedia Tools and Applications (2019) 78:10889–10931 10903 where, λ represents the wavelength of the sinusoidal factor, θ represents the orienta- tion of the normal to the parallel stripes of a Gabor function, ψ is the phase offset, σ is the sigma/standard deviation of the Gaussian envelope γ is the spatial aspect ratio. 0 x ¼ xcosθ þ ysinθ ð2Þ 0 y ¼ −xsinθ þ ycosθ ð3Þ A two-dimensional (2D) Gabor filter can be represented by means of the subsequent equation inside the spatial domain. hðx; yÞ ¼ gðx; yÞe−2πjðuo xþvo yÞ ð4Þ Where g(x,y) is the Gaussian function given by: g ðx; yÞ ¼ e1=2 x2 þ y2 =σ2 ð5Þ Said et al. [141, 142] performed multichannel Gabor filtering and Gray scale co-occurrence matrices on 25 samples per writer. They also concluded that Gabor filtering gave promising performance as compare to gray scale co-occurrence matrix. Same methodology employed machine print documents for script [177] and font identification [201]. Siddiqi and Vincent [161] applied Gabor filter on IAM data-set using 100 writing samples. Shahabi and Rahmati [156] developed a method for Farsi text independent script. Gabor- energy and moments approaches used for feature extraction from 40 samples. 48 features extracted using Gabor filtering and transformation. 4 texture images blocks used for training and 2 for the testing of system. Ubul et al. [180] proposed a technique of feature selection for the identification of writer in Uyghur. They also combined this technique with Gabor features. Extended Gabor features are obtained by the way of modulating a 2D circular sinusoid with 2D Guassian given by: r x x2 þ r y y2 2 x þ y2 xg x; y; θ; rx ; ry ¼ exp − sin θ: ð6Þ σ2 rx þ r y Extended Gabor features employed by Helli and Moghaddam [90–92]. They applied extend Gabor model for the feature extraction of Persian script. They represent the strength of image as sum of all pixel values of image and extract 2d extended Gabor features. Gabor wavelet features were calculated by He et al. in [88, 89]. One of the popular statistical is Contour Direction and Hinge Features proposed by Bulacu and Schomaker [37]. Contour is extracted by the following formula: n o Contouri ¼ p j j j ≤M i ; pi ¼ pM i ð7Þ Along the contour of writing stroke, the perspective angle (φ) histogram is generated and normalized into a probability distribution pf(φ). From the horizontal direction, the angle (φ) is calculated as ykþ∈ −yk φ ¼ tan−1 ð8Þ xkþ∈ −xk Where ϵ represents the thickness of the stroke. 10904 Multimedia Tools and Applications (2019) 78:10889–10931 Siddiqi and Vincent [162] identified writer of handwritten document by extracting contours, writer specific features using local approach and code-book generation. They used the application of Gabor filtering for feature extraction. Furthermore, in [164] they extracted histograms of the chain code, first and second order differential chain code. Al-Maadeed et al. [12] extracted edge- based directional probability distributions features like height, area, length, and three edge- direction distributions with different sizes. Adak et al. [4] extracted handcrafted features like micro-macro features, contour direction and hinge features, direction and curvature features. Schlapbach and Bunke in [149] develop HMM recognizer by extracting 3 global features like black pixels in window, the second order moment and the center of gravity while 6 local features like upper and lower pixels positions and orientations, black pixels fraction and the transitions in window with a total of 9 parameters were extracted. They developed feature vector of 9 dimensions for training using Baum Welch algorithm. Hassaine et al. [82] characterized English writing using a set of geometrical features like directions, curvatures, tortuosity, chain code and edge based directional features to identify the writer. One form of statistical features in frequency domain is gradient features. Let f(x, y) is the grayscale level of point (x, y), the horizontal and vertical grayscale gradient are derived as: dx ¼ f ðx þ 1; y þ 1Þ þ 2f ðx þ 1; yÞ þ f ðx þ 1; y−1Þ−f ðx−1; y−1Þ−2f ðx−1; yÞ− f ðx−1; y þ 1Þ ð9Þ dy direction ¼ tan−1 ð10Þ dx Another feature approach for writer identification is gradient features in frequency domain by Ram and Moghaddam in [125]. They used direction information and interval ranges for Persian script in order to extract important discriminative ranges intervals for features. Chanda et al. [44] conducted experiment on Bengali script and quantified into sixteen directions. In this way they extracted gradient based direction features. Also they quantified four directions and computed chain code based direction features. Same methodology was employed by Awaida and Mahmoud in [21] for feature extraction. Kumar and Kaur [109] computed directional features and then applied PCA for dimension reduction. They used slant of hand- written samples, skew, pixel distribution like horizontal profiling, curvature, and entropy calculated using image processing techniques. These features were selected using Fishers Linear Discriminant Analysis. Srihari et al. [166] extracted micro and macro features with large number of parameters. Micro features that were extracted was Gradient Structural and Concavity (GSC) features while macro features contained height, slant, gray-level entropy and threshold, number of text pixels, number of slope components, paragraph aspect ratio and indentation, word length, and zone ratio. They reported that micro features increase the identification rate at 80%. Arazi [16, 17] computed Grey scale histogram for extracting specific features like first indented letter from handwritten sample and external features like margin etc. Zimmerman and Varady [202] calculated run length coding to extract features. Another approach for mining features in Chinese script is edge-hinge features presented by Wen et al. in [185]. They employed generalized GMF of edge structure coding (ESC) for the distributing edge fragments on multiple scales. Scale Invariant Feature Transform (SIFT) [114] is used to detect and describe local features in images. It worked on four majior steps. First step is the detection of scale-space extrema. Multimedia Tools and Applications (2019) 78:10889–10931 10905 The scale space of an image is defined as a function Where, Lðx; y; σÞ ¼ Gðx; y; σÞI ðx; yÞ ð11Þ 1 −ðx2 þy2 Þ=2σ2 Gðx; y; σÞ ¼ e ð12Þ 2πσ2 Second step is key point localization that is interpolation using the quadratic Taylor expansion of the Difference-of-Gaussian scale-space function given by, ∂DT 1 ∂2 D Dðx; y; σÞ ¼ D þ x þ xT 2 x ð13Þ ∂Dx 2 ∂x Third step is orientation assignment in which gradient and orientation is computed as mðx; yÞ ¼ ðLðx þ 1; yÞ−Lðx−1; yÞÞ2 þ ðLðx; y þ 1Þ−Lðx; y−1ÞÞ2 ð14Þ θðx; yÞ ¼ atan2ðLðx; y þ 1Þ−Lðx; y−1Þ; Lðx þ 1; yÞ−Lðx−1; yÞÞ ð15Þ Finally the feature vector is created from each key point. Woodard et al. [186] extracted local features using quantized SIFT for recognizing writer. Tang et al. [175] combined SIFT features and triangular features to develop a writer identifi- cation system that has improved structural features. Hu et al. [94] described the writing style by encoding SIFT using two coding strategies: locality constrained linear coding and fisher kernel coding. Fecker et al. [63] conducted experiment for historical Arabic writer identification by computing SIFT features. Statistical techniques were extracted from multilingual datasets. Wu et al. [188] extract the SIFT descriptor, scale and orientation features and generate a codebook using hierarchical Kohonen SOM clustering. Xiong et al. [190] accompanied perfor- mance on ICFHR2012-Latin, ICDAR2013 by extracting SIFT descriptor and contour directional features. They applied K-means clustering for code book generation, occurrence histogram of SIFT descriptor were extracted. Fiel and Sablatnig [64] extracted the local features SIFT from the English and Dutch languages. Another attempt is [65] to retrieve and identify the writers using SIFT and bag of word features. They generated a code-book and calculate occurrence histogram of clusters. A major contribution by Christlein et al. [47], in 2014 was encoding of features with GMM super vectors. They extracted Root SIFT features extracted into super vectors. Universal Background Model (UBM) was created by estimating a GMM from a set of SIFT descriptors. They applied normalization by computing square root element wise and then l2 normalized. Same technique of GMM super vector encoding was employed in [49]. In [50], they used VLAD encoding for SIFT descriptors and global descriptors generation. Another attempt by the same author was in [48] to increase the recognition rate by extracting Root SIFT features from the boundary edges of handwriting. They created the feature vector from the GMM parameters represented by λ = {ωk, μk,Pk|k = 1....K}, given as: K pðxjλÞ ¼ ∑ ωkgk ðxÞ ð16Þ k¼0 10906 Multimedia Tools and Applications (2019) 78:10889–10931 Where gk is the guassian function is given by means of: 1 T gk ðxÞ ¼ g ðx; μk; ∑K Þ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ϵ−1=2ðx−μk Þ ∑k −1 ð17Þ ð2πÞD j∑k One of the inspired feature of SIFT is Oriented Basic Image Feature Columns (oBIF Columns) that has been extensively used in character recognition [131] and texture recognition [178]. Newell and Griffin [132] computed oBIF Columns and improved texture-based scheme by encoding a writer’s style. Another type of local features for statistical analysis is SURF [24] that is used for writer identification by Sharma and Dhaka in [157]. Garg et al. [69] worked on Punjabi script Gurmukhi. They extracted different statistical features like transition, zoning, horizontal peak and vertical peak value, diagonals, centroid, end points and intersection, curve fitting of parabola and power based. They used the technique of Kumar et al. [108] to extract the features. Khan et al. [103] extracted features using Discrete Cosine Transform (DCT) and sliding window. DCT generated thousands of features that were randomly selected using three clustering techniques. They applied k-means, one dimensional SOM (Self Organizing Map) and two dimensional SOM for clustering of features to generate code-book of each sample. They generated a final feature vector by producing normalized histogram of co-occurrences. They imposed bagging technique for re-sampling as random selected features caused misclassification problem. The DCT transforms a block of pixels b of length N1 x N2 into a matrix of actual numbers as: 2 N 1 −1 N 2 −1 uπ vπ Bðu; vÞ ¼ C ðuÞC ðvÞ ∑ ∑ cos ði þ 0:5Þ cos ð j þ 0:5Þ bði; jÞ ð18Þ N 1N 2 i¼0 j¼0 N1 N2 Venugopal and Sundaram [182] worked on online territory to extract the codebook using a renowned encoding scheme Vector of Local Aggregate de- scriptor (VLAD). Some features are specific according to the domain of online writer identification like pressure of pen, Azimuth, velocity [77], Velocity bary center [177],altitude, direction of writing, Correlation between length [45], Continuous Dynamic programming [62, 100] and input of pen (pen down or pen up.The values record if pen is up also) [127]. The aforesaid features are related to the writing associated with the document. However, online writing is also allied with paragraphs. Paragraph features that were used by the researchers were Code- book generation [98, 128], stroke based, and point based [151]. Zhang et al. [198] used random hybrid stroke features like movement of pen tip in x and y directions, status of pen in up and down position for online domain. Gargouri et al. [70] extracted dynamic features like strokes, spaces between strokes and words, and points from Arabic script ADAB database. Table 2 summarize the statistical local and global features. 4.2.2 Structural features The structural features represent the local structure and topology of characters or writing such as Edges, loops, dots and diacritics, vertical and horizontal lines, start and end point, direction of writing, thickness or thinness of strokes and corners etc. There are many structural features [110, 135, 171] like graphemes, fragments, strokes etc. extracted from handwritten samples. Grapheme is a small segmented handwriting. Inside the Multimedia Tools and Applications (2019) 78:10889–10931 10907 Table 2 Review of writer identification systems using statistical local and global features Reference Language Features Garg et al. [69] Panjabi (Gurmukhi) transition, zoning, horizontal peak and vertical peak value, diagonals, centroid, end points and intersection, curve fitting of parabola and power based Khan et al. [103] English Discrete Cosine Transform (DCT) based features German Arabic Christlein et al. [47, 50] English,Arabic, RootSIFT encoded by UBF and GMM super German, Greek vectors Adak et al. [4] Bengali Contour Direction and Hinge Features, Direction and Curvature Features at Keypoints Kumar and Kaur [108] English directional features, slant, skew, pixel distribution like horizontal profiling, curvature, and entropy Venugopal and Sundaram [182] English Vector of Local Aggregate descriptor (VLAD) Zhang et al. [198] English, Chinese movement of pen tip in x and y directions, status of pen in up and down position Sharma and Dhaka [157] English, German, SURF French, Greek Xiong et al. [190] English, Greek SIFT descriptor and contour directional feature Kumar et al. [107] Panjabi(Gurmukhi) transition, zoning, horizontal peak and vertical peak value, diagonals, centroid, end points and intersection, curve fitting of parabola and power based Wu et al. [188] English, French, SIFT descriptors, scales and orientations German, Chinese, Greek Tang et al. [178] Chinese SIFT features and triangular Newell and Griffin [132] English oBIF Column Hu et al. [94] Chinese SIFT, Bag of features Fecker et al. [63] Arabic SIFT Awaida and Mahmoud [22] Arabic Gradient based direction features, Chain code based direction features Fiel and Sablatnig [65] English, Greek, SIFT and bag of word features German, French Fiel and Sablatnig [64] English, Dutch SIFT Wen et al. [188] Chinese Edge-Hinge features Hassaine et al. [83] English Directions, curvatures, tortuosity, chaincode and edge based directional features Chanda et al. [44] Bangali Gradient based direction features, Chain code based direction features He et al. [90] Chinese Gabor wavelet features Woodard et al. [186] Arabic SIFT Ram and Moghaddam [126] Persian Gradient features Shahabi and Rahmati [156] Farsi Gabor-energy and moments features Siddiqi and Vincent [165] English Histograms of the chain code, first and second order differential chain code Helli and Moghaddam [90–92] Persian Extended Gabor features Zhang et al. [198] Chinese 2d Gabor features with di- mension of mesh fractal Al-Maadeed et al. [12] Arabic Height, area, length, and three edge-direction distributions with different sizes. Siddiqi and Vincent [164] English Gabor features He et al. [86, 89] Chinese Wavelet features with GGDM Ubul et al. [180] Uyghur Gabor features Al-Maadeed et al. [11] Arabic Fourier spectral features Siddiqi and Vincent [163] English Gabor features 10908 Multimedia Tools and Applications (2019) 78:10889–10931 Table 2 (continued) Reference Language Features Bulacu and Schomaker [37] English Contour Direction and Hinge Features Schlapbach and Bunke [151] English black pixels in window, the second order moment and the center of gravity, upper and lower pixels positions and orientations, black pixels fraction and the transitions in window Nejad and Rahmati [130] Farsi Moment based gabor en- ergy features He et al. [87] Chinese 2d Gabor features, auto correlation function, 2d Gabor features & auto correlation function Schomaker and Bulacu [154] Dutch Edge Hinge, Edge Direc- tion features Bulacu et al. [39] English Edge Direction, Edge Hinge, Auto correlation, entropy, Run-Length features Shen et al. [159] English Gabor wavelet features Zhu et al. [202] Chinese 2d Gabor features Said et al. [141, 142] English Gabor filtering, Gray scale co-occurrence DCT Discrete Cosine Transform; SIFT Scale Invariant Feature Transform; GMM Gaussian Mixture model; UBM Universal Background Model; VLAD Vector of Local Aggregate descriptor; oBIF oriented Basic Image Features; SURF Speed up Robust Feature; GGDM Generated Gaussian Density Model; GSC Gray Scale Cooccurrence grapheme codebook creation level, graphemes are extracted from samples of handwritten textual content. This step is per- formed via segmenting the handwriting text into segments of lines and then the segmented lines are segmented into small handwriting segments. Every segment might contain zero, one or more than one grapheme. Every handwritten file Dj is hence described by using the set of graphemes xi is made by the following relation D j ¼ fxi; i ≤Card ðDÞg ð19Þ Bensefia et al. [26] segmented handwriting samples and local features like graphemes were clustered. Feature space was created to store query and database samples. They used vector space model for local features. Another effort by the same authors were fragmented handwrit- ing for invariants of writer and extracted graphemes [25]. Pandey and Seeja [133] segmented the handwriting documents into graphemes features that were represented using horizontal profile projection. They generated the codebook using k means clustering and feature vector of 1xk is produced. Abdi and Khemakhem [2] employed beta-elliptic model for graphemes extraction. Miller et al. [123] created graphemes from segmented handwriting samples based on topological and geometric class framework and then skeletonized. Durou et al. [56] welded OBIs and Grapheme features to produce a feature vector that is reduced by PCA and mapped using k Eigen vectors. Kumar et al. [108] extracted graphemes with the representation of Fourier and Wavelet descriptor. Each grapheme was encoded by sparse coding using vector quantization. Graz et al. [71] computed the relationship between strokes, junctions, endings, and loops. They extracted scale independent descriptors like local-angle, orientation, orientation local angle distributions and multiscale descriptors. Local Binary Patterns (LBP) features is a unifying approach that is traditionally divergent from statistical and structural models. The LBP maps each pixel to an integer code representing the connection among the center pixel and its neighborhoods. It encapsulates the neighborhood geometry at every pixel via encoding binarized differences with neighbor pixels as: Multimedia Tools and Applications (2019) 78:10889–10931 10909 N −1 LBP ¼ ∑ sðPn ; Pc Þ*2n ð20Þ n¼0 Wherein, pc is the relevant pixel being encoded, pn are N symmetrically and uniformly sampled factors at the outer edge of a circular region of pc, and s(pn,pc) is a binarization function. A broadly used binarization function s(pn,pc) is defined as: 1; pn≥pc sðpn ; pc Þ ¼ ð21Þ 0; pn≤pc Bertolini et al. [27] computed Local Binary Pattern (LBP) and Local Phase Quantization (LPQ) descriptors. Hannad and Siddiqi [78] employed LBP as a texture descriptor and the excessive discriminative feature of handwritten fragments to enhance the overall performance of Arabic writer identification. Another contribution was in [80] in which handwritten text is divided into small fragments as a texture. They evaluated the effectiveness of LBP, LPQ and local ternary patterns (LTP). In multiscript environment, Chahi et al. [43] calculated Block Wise Local Binary Count (BW-LBC) features in such a way to compute the histogram from connected components and then find co-occurrence distribution function. Additional structural features were fragments that are small parts or strokes rather than complex shape representation of writing. Ghiasi and Safabakhsh [74] used minor connected components to extract fragments. Alternative structural feature is allographs that is calculated by Bulacu and Schomaker in [34, 36, 153]. However Jain and Doermann [97] represented handwriting in the shape of segments using k adjacent technique. They also calculated contour gradient descriptors (CGD). Another method to detect the junctions of stroke fragments was presented by He et al. in [84]. They applied probability distribution and calculated junctions that served as features. Another attempt by the same author was [83] to extract run-lengths features of local binary pattern and Cloud of Line Distribution (COLD) features. Siddiqi and Vincent [163, 165] divided the words into small sub-pixels which best contain a part of stroke or fragment. They applied the sub-pixels to symbolize the redundant styles which are unique to specific writer. Ahmed et al. [5] deployed optimum features. They computed contour detection from the connected components and extracted fragment code from it. Along with it they also calculated ending strokes form handwriting. Another way to identify writer was to combine textural and allographic features presented by Bulacu and Schomaker in [38] on Arabic Script. A probability distribution function was generated from the extracted texture features while 400 allographs were clustered to generate code-book. Another approach by the same author is the simple definition of this method in [39]. They retrieved the best performance by employing edge hinge features. We summarize and compare the different structural features deployed by different re- searchers in the literature in Table 3. 4.2.3 Model based automatic features The Machine model based or automatic features extract by specific models automatically from the raw data of the image directly. Deep learning is based on learning data representation and 10910 Multimedia Tools and Applications (2019) 78:10889–10931 Table 3 Review of writer identification systems using structural features Reference Language Features Pandey and Seeja [134] English Graphemes Chahi et al. [43] English, Arabic, German Block Wise Local Binary Count (BW-LBC) features Durou et al. [56] English, Arabic OBIs and Graphemes He et al. [84] English, Chinese, Dutch LBP and COLD features Miller et al. [124] English Graphemes Ahmed et al. [5] English, Arabic, Kurdish, Fragments German, Greek and French Bertolini et al. [27] English, Arabic LBP and LPQ descriptors Hannad and Siddiqi [81] Arabic Fragments, LBP, LPQ and LTP Garz et al. [71] English p(Is, Iθ), p(IBOS) Hannad and Siddiqi [79] Arabic LBP Abdi and Khemakhem [2] Arabic Graphemes He et al. [85] English, Chinese Junctions of stroke fragments Kumar et al. [109] English Graphemes Ghiasi and Safabakhsh [75] English Contour fragment features Tang et al. [179] English Contour pattern features, Stroke fragment features Jain and Doermann [98] English, Arabic Segments, Contour Gradient Descriptors (CGD) Siddiqi and Vincent [167, 165] English Strokes, Fragments Bulacu and Schomaker [34, 36] English Allographs Schlapbach et al. [150] English Stroke based text line features Bensefia et al. [25, 27] English Graphemes BW-LBC Block Wise Local Binary Count; OBIs Oriented Basic Images; COLD Cloud Of Line Distribution; LPQ Local Phase Quantization; LTP Local Ternary Patterns; CGD Contour Gradient Descriptors; LBP Local Binary Pattern has the ability to learn from data without explicitly programmed using statistical approaches and algorithms. This type features extraction need enormous samples of images for training the model. Some examples are Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Extend Learning Model (ELM) and some other deep machine learning models based features. As compare to hand-designed features and structural features, automatic features learned by deep model usually show higher performance because more data-adaptive information can be exploited in the learned features. Thus we can say that automatic features are effective to provide better recognition rate. Fig. 6 CNN based feature extraction Multimedia Tools and Applications (2019) 78:10889–10931 10911 Convolutional Neural Network consists of multiple layers like input, convolution, Relu, pooling, fully connected and softmax layers. There are two ways to extract the automatic features from CNN depicted in Fig. 6. One can also choose the features from the convolutional layers. ConvNet features are more generic in early layers and more original-dataset-specific in later layers. Second approach is to cutt off the last layer of the CNN. This layer is basically does the labeling of the input data. The output of the neurons of the second last fully connected layer are used as feature vector. This vector is then used for the distance measurement between two different document images to describe the similarity of the handwriting. CNN has employed in the field of text recognition by Wang et al. [184]. However, to the best of our knowledge CNNs have rare used for writer identification so far. A reason might be that typically the training and test sets of most datasets are disjoint making it impossible to train a CNN for classification. We here present the related work of writer identification using deep learning approach. Deep learning techniques was first introduced for writer recognition by Fiel and Sablatnig [66]. They employed eight layer CNN and extract the features from fully connected or penulti- mate layer. These CNN based activation features were served as feature vector. Xing and Qiao in [189] introduced CNN based approach called as deep writer multi-stream to learn features. They learn the automatic features from the last fully connected layer that is FC7. Christlein et al. [49] calculated local descriptors using activation features from CNN. Another effort by the same author in [50] was unsupervised learning. They computed CNN activation features from image patches of 3232. In [51] CNN activation features were extracted from LeNet and ResNet CNN models. They used CNN activation features as local features by encoding with VLAD. Nasuno and Arai [129] employed AlexNet CNN to extract activation features from trained 90 words of Japanese. Yang et al. [193] calculated automated features from CNN and named as path-signature features. The comparative summary is given in Table 4. 4.3 Classification After extracting the features, classification is performed to classify the target classes in pattern recognition problems. Different approaches used to identify, compare and classify the writer. The objective is to matching the features of query document image with the pre-stored knowledge base features for the sake of authenticity and identification of writer using large Table 4 Review of writer identification systems using model based or automatic features Reference Language Features Christlein et al. [51] English, Greek, German, Arabic LeNet, ResNet Xing and Qiao [189] English CNN based features Christlein et al. [50] German, Latin, and French Deep Residual Network (ResNet) based learned features Nasuno and Arai [129] Japnease Alexnet CNN based features Christlein et al. [49] English, Arabic, German, Greek CNN activation features Yang et al. [193] Chinese Path signature features learned from CNN model deployed in online domain called DeepwriterID Fiel and Sablatnig [66] English, Greek CNN based features ResNet Residual Network; CNN Convolutional Neural Network 10912 Multimedia Tools and Applications (2019) 78:10889–10931 number of instances of writer handwriting images in a train set. The approaches used in the literature of writer identification are nearest neighbor, Hidden Markov Models (HMM), Cosine similarity, Gaussian mixture model (GMM), Fourier transformation approach, Euclidean distances, Bayesian classifiers and neural networks approaches. For the ease of understanding, we divide the literature of classification into three categories: distance based classification, conventional machine learning models based classification and deep learning models based classification. Related work of each approach is illustrated in the coming sections. 4.3.1 Distance based classification One of the simple and effective approach is to identify the writer is using distance based classification. This approach is free of parameter and training of model. Due to the absence of models complexity is minimum or reduced. Distance measure is applied between the query and reference knowledge document image. Most widely used distance measures in writer identification and recognition are Euclidean distance, Chi-square distance, Manhattan distance, and hamming distance. Query document matched with the references documents using different approaches. Among them, one is Weighted Euclidean Distance (WED) defined as: 2 f n −f kn N WEDðk Þ ¼ ∑ 2 ð22Þ n¼1 vkn where fn is the nth feature of the input document, fn(k) is the sample mean vn(k) is the sample standard deviation and n is the feature vector of writer. Said et al. [142] employed nearest centroid classification using Weighted Euclidean Distance (WED). They stated that WED classified better than KNN and reported the accuracy of 96% on classifying 40 writers having 1000 test documents. Another attempt by the same author was in [141]. In this work they tested 150 documents of 10 writer and retrieved highest accuracy of 96%. Al-Maadeed et al. [22] worked on Arabic script by constructing an offline handwritten database collected from 100 writers using only 16 words for experimentation. They used 75% of the data for training and 25% for testing with 32,000 Arabic text images. Weighted Euclidean distance was used for the classification with the 90% accuracy in Top-10. Another distance measure is chi-square distance χ2 defined as: 2 2 n f ki −mkj χ ¼ ∑ ð23Þ k¼1 f ki þ mkj where, j represents writer, i is the input, fki is the kth feature of the unknown input text i and mkj is the mean value of the kth feature. Shahabi and Rahmati [156] conducted experiment on handwriting samples of 40 persons. They used weighted Euclidean distance and chi-square distance and reported the better performance of chi-square distance on 80 features with frequency of 2.8, 4.4 the identification rate is 97.5% on size of hit list 5. Ahmed et al. [5] identified the writers using chi-square distance on con- ducting experiment on four datasets namely KURD, ICFHR, IAM and GRDS. They achieved the identification rate of 94.63% on KURD, 97.12% on ICFHR, 95.59% on IAM and 100% on GRDS correspondingly. Multimedia Tools and Applications (2019) 78:10889–10931 10913 Xiong et al. [191] accompanied performance on ICFHR2012-Latin, ICDAR2013. They computed similarity using weighted Chi-squared distance and reported 94.0% accuracy on ICFHR2012-Latin, 96.2% on ICDAR2013 for Greek, 94.0% on ICDAR2013 for English on Top1. Fiel and Sablatnig [64] computed comparison with the chi-square distance for writer retrieval and identification. They employed nearest neighbor for the identification of writer. Computed accuracy of 93.1% on IAM, 98.9% on TrigraphSlant for writer retrieval, and 98.9% on TrigraphSlant for writer identification was achieved. Manhattan distance is the distance measure used to classify two handwrit- ten images Ii and I2 having values u = u1, u2, ...., uN and v = v1, v2, ...., vN. Manhattan distance is calculated as: N Dm ðu; vÞ ¼ ∑ jui −vi j ð24Þ i¼1 One of the renowned distance measure is Euclidean distance, calculated as: sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ N De ðu; vÞ ¼ ∑ ðui −vi Þ2 ð25Þ i¼1 Distance based classification was consummate on Multilanguage datasets by many researchers. Wu, Tang, and Bu [188] used Manhattan and Chi-square distances for the comparison to identify the writer. They conducted experiment on six datasets and reported accuracy of 98.5% on IAM, 92.4% on Fire- maker 95.4% on HIT-MW, 99.5% on ICDAR 2011, 98.0% on ICFHR2012 in Top1. Durou et al. [56] deployed KNN using Euclidean distance, Chi-square distances and Manhattan distances on IAM and ICFHR-2012 dataset. They retrieved 96.05% using Euclidian distance, 73% using Chi-square distance and 80% using Manhattan distance. Siddiqi and Vincent [164] using Euclidean, chi-square, Hamming, and Bhattacharyya distance to identify the writer. They reported 86% accuracy on IAM dataset and 79% on RIMES dataset in top-1. 4.3.2 Conventional machine learning models based classification The conventional models required sufficient training samples and produced effective and appropriate results than distance based classification. The probabilistic and statistical genera- tive conventional models are Bayesian model, Hidden markov model (HMM) and Gaussian mixture model (GMM), Neural networks (NN), decision tree learning, nearest neighbor (KNN), Support Vector Machine (SVM) and random forest etc. The coming sections will describe the related work according to each model. Naive Bayes model Naive Bayes is the probabilistic model that operate on Bayes theorem. It has been extensively used since 1950s.Bayesian classifier can be write using decision rules in terms of posterior probability P(Ci-x), class conditional probability p(x-Ci), and prior proba- bility P(Ci): ibayes ¼ argi maxPðC i jxÞ ¼ argi maxpðxjC i ÞPðC i Þ ð26Þ 10914 Multimedia Tools and Applications (2019) 78:10889–10931 This search for a class i and maximize the probability that a pattern x belongs to class Ci. Here class conditional probability p(x-Ci) can be assumed as Gaussian distribution for class Ci. It can be expressed as: 1 1 I −1 PðxjC i Þ ¼ exp 1− ðx−α i Þ Conv i ðx−α i Þ ð27Þ ð2πÞd=2 2 where αi represents the class mean. In order to perform comparison between an unknown image I with the already documents D, similarity index is calculated as: 1 card ðI Þ SimilarityðI; DÞ ¼ ∑ maxci∈D PðC i jxi Þ ð28Þ Card ðI Þ j¼1 The writer of image I is the maximum similarity index of writer belonging to document D given as: WriterðI Þ ¼ Writer arg Di∈R maxSimilarityðI; Di Þ ð29Þ Siddiqi and Vincent [161] employed Naive Bayes model on IAM dataset. They used 50 writing samples for training and grouped the same sub images in a cluster. Bayesian classifier gave the identification accuracy of 94%. Another contribution by the same author was in [163]. They conducted experiment on IAM dataset using 100 writing samples. Naive Bayesian was used as a classifier to identify the writer. They achieved 92% accuracy in top-1. Kamal and Rahman [99] authenticate the document of writer using Bayesian classifier by conducting experiment on 50 documents for evaluating test. They retrieved 94% accuracy for writer identification. In multilingual environment, Zois and Anastassopoulos [203] classified 50 group of writers using Bayesian classifier along with weighted Euclidean distance. They achieved the identification rate of 92.48% for the English word and 92.63% for the Greek by creating their own private dataset. Garg et al. [69] deployed naive bayes and three other classifiers on Gurmukhi dataset of 49,000 samples written 70 persons. 70.10% accuracy was retained using Transition features classified by naive bayes. Hidden Markov model (HMM) Hidden Morkov Model is a statistical model in which the system being modeled is assumed to be a Markov process with hidden states. It can be expressed as dynamic Bayesian network. HMM works on the principle of assuming the observation sequence is generated by hidden sequence. The joint probability of the observation sequence x1:T and hidden state sequence z1:T is defined as: pðz1:T ; x1:T Þ ¼ pðz1:T Þpðx1:T jz1:T Þ T T ð30Þ pðz1:T ; x1:T Þ ¼ pðx1 Þ ∏ p zT jzT −1 ∏ pðxT jzT Þ t¼2 t¼1 where zt ϵ1,2,N, N is the number of states. The HMM is characterized by λ = π, A,B.π = πi are the initial state distribution, πi = p(z1 = i),1 ≤ i ≤ N.A = aij are the state transition distribution, aij = p(zt = j|zt1 = i),1 ≤ i,j ≤ N. B represents the observation of model parameters. HMM was first used in writer identification by Schlapbach and Bunke in [145, 146]. They identified 100 true classified writers using IAM dataset and achieved 96.56% Multimedia Tools and Applications (2019) 78:10889–10931 10915 identification rate. Also verification of 2.5% EER was retrieved using 120 writers having 8600 text lines. Another development of HMM recognizer for each writer by the same author was presented in [149]. They used 100 writing samples from IAM database and developed feature vector of 9 dimensions for training using Baum Welch algorithm. Using this approach they acquired 96.5% accuracy. Another attempt by the same author [147] was to test the identification rate on HMM based writer identification system. 93.13% accuracy is achieved by default. HMM was also deployed in online writer identification domain by Wu et al. [187] on IAM On-line English Hand- written Text Database (IAM-OnDB). They concluded that HMM gave better results than traditional GMM systems while conducting experiments on the level of paragraph and line. Hidden Markov Tree (HMT) model was employed by He et al. in [88] for offline text independent approach using 1000 samples of Chinese handwriting. Sheikh and Khotanlou [158] trained the HMM features using Baum Welch algorithm. They used 70 Persian hand- writing samples, 50 for training and 20 for testing of HMM toolbox. They salvaged the highest accuracy of 99% using Chain Network to train the model for acquiring increase in accuracy. K nearest neighbor (KNN) K Nearest Neighbor (KNN) is the conventional machine learning model. It is the non-parametric model used for classification. KNN suppose pairs (X1,Y1), (X2,Y2) ...(Xn,Yn) taking values from Rd. Here Y is the class label of X so that X-Y = r ∼ Pr for r = 1,2, pr (probability distribution). Let give some norm ||. || on Rd. with a point x ϵ Rd. Let the training data (X(1)),(Y(1)) ...(X(n)),(Y(n)) be reordering such that ||X(1) –x || ≤ ··· ≤ ||X(n) –x||. The training phase contains the training samples feature vector and class labels. The classification phase contain a user-defined constant k and a test vector. KNN used a continuous variable usually a distance measure Euclidean distance given in eq. (25). Literature reveals that many researchers identified writer using KNN on English script. Marti et al. [122] employed KNN on 100 pages samples of IAM dataset written by 20 writer. They achieved the accuracy of 87.8%. Blankers et al. [31] classified 41 writers database using KNN and achieved 98% identification rate. Bulacu and Schomaker in [37, 38] imposed the experiment on Arabic database of 350 writers with 5 samples per writer. A probability distribution function was gener- ated from the extracted texture features while 400 allographs were clustered to generate codebook. They applied nearest neighbor as a distance measure. They computed 88% in top-1 and 99% in top-10. Awaida and. Mahmoud [20] performed experiment on Arabic digit database of 70,000 samples. They applied KNN and nearest mean classifier and reported 88.14% accuracy in top 1. Pandey and Seeja [133] identified the writer using KNN. They salvaged 88.57% accuracy with 240 clusters on accomplishment experiment with IAM dataset. KNN was deployed on multilingual corupus. Fiel and Sablatnig [66] conducted experiment on ICDAR 2013, ICDAR 2011,CVL Databases. They iden- tify the writer using nearest neighbor approach with the computed accuracy of 94.7% on ICDAR 2011, 88.5% on ICDAR2013, 98.3% on CVL in Top 1. Durou et al. [56] deployed KNN with codebook size of 250, 500 and 1000 and compared the results with [102]. They conducted experiment on IAM and and ICFHR- 2012 dataset and achieved 87.56% on codebook size of 250, 87.96% on 500 and 88.01% on 1000 in top-1 accuracy by compared with the results of Khalifa et al. [102]. Chahi et al. [43] instigated KNN with hamming distance on conducting experiment on IAM, 10916 Multimedia Tools and Applications (2019) 78:10889–10931 IFN/ENIT, AHTID/MW and CVL databases. They reported average accuracy of 88.99% on IAM, 96.47% on IFN/ENIT, 99.53% on AHTID/MW and 98.38% on CVL while classifying BW-LBC features. He et al. [83] measured the similarity between handwritings using Chi- square distance. They used nearest neighbor for classification. They divided the CERUG dataset into English and Chinease subsets. They reported 93.3% on CERUG-Chinease, 95.2% on CERUG-En- glish, 98.5% on CERUG-Mixed, 86.2% on Firemaker and 89.9% on IAM respectively. Support vector machine (SVM) Support Vector Machine is the widely used classification model since 1990. It has been used successfully in many real-world problems especially in hand- written character recognition and text categorization. Similarly researchers are found to be interested in accomplishment of SVM for writer identification and verification. SVM takes the training dataset of n points of the form: (x1, y1) … (xn, yn) where the yi represents the class to which real ! xi belongs. SVM soft margin can be computed and minimize as: 1 n ∑ max 0; 1; yi w:! xi −b þ λ k w k2 ð31Þ n i¼1 where, w. xi is the hyper plane, value for λ yields that xi lies on the correct side of margin size. Imdad et al. [96] trained and tested Steered Hermite Features using SVM with 20 authors of IAM database and retrieved 90% accuracy. Gargouri et al. [70] employed linear SVM and DTW on Arabic database of 15,158 words. They reported 45.67% accuracy in top 1 which is improved to 96.90% in top 10 using SVM. Adak et al. [4] conducted experiment on Bengali script of 100 samples. SVM with RBF function was used for classification of handcrafted features. They conducted training and testing on various handwriting speeds and computed F- measures. F-Measure of 43.65% had been obtained for Top-1. Chanda et al. [43] worked on Oriya script of 100 writers’ database. They classified using SVM and achieved 94% accuracy. Amaral et al. [14, 15] conducted experiment on Brazilian Forensic Letter Database of 20 writers. They applied SVM as a classifier and reported the identification rate of 80%. SVM had been deployed in the multilingual data corpus. Christlein et al. [49] employed Exemplar SVM for training. They evaluated 88.9% in Top 1using ICDAR 2013 and CVL database. In [51], they conducted experiment on ICDAR13, CVL and KHATT dataset using Exemplar SVM and VLAD encoding. They reported the accuracy of 99.6% on ICDAR13, 99.5% on CVL, 99.6% on KHATT in Top-1. Kumar and Kaur [109] classified the features using SVM. IAM dataset was used for experimentation. They reported the accuracy of 0.95. Bertolini et al. [27] selected 475 samples from QUWI dataset to performed two multiscript experiments. They trained SVM in Arabic script and test in English and achieved 22.1% EER for LBP and 25.9%EER for LPQ in text independent analysis while same experiment for text dependent analysis gave 1.3% EER for LBP and 2.8% EER for LPQ correspondingly. On the contrary, they trained SVM in English and test in Arabic and reported EER of 38.3% on LBP and 29.1% on LPQ in text independent analysis while in text dependent analysis they retrieved EER of 9.1% on LBP and 5.5% on LPQ respectively. SVM was also used in online domain by Gargouri, Kanoun and Ogier [70] along with DTW for classification of ADAB database. Venugopal and Sun- daram [182] conducted experiment in online domain on IAM online database and IBM UB 1 dataset. They employed multiclass SVM with the kernal of Ra- dial Basis Function. They reterived highest accuracy of Multimedia Tools and Applications (2019) 78:10889–10931 10917 97.81% on paragraph on IAM dataset at codebook size of 45. On IBM UB 1, they achived 94.37% on paragraph at codebook 60 respectively. Others models Besides the aforementioned classification models there are some other models and techniques to identify the writer. One of them is vector space model employed by Bensefia et al. [126]. They conducted experiment on PSI and IAM databases. They used vector space model for local features. Hypothesis of information retrieval model was used for identification. They reported 95% identification rate on PSI database while 86% on IAM database. Christlein et al. [47] used GMM super vectors on ICDAR- 2013 and CVL database. Fisher and VLAD encoding schemes were used for the comparison to identify writer. They achieved 95.1 to 97.1% on ICDAR- 2013, and from 97.9 to 99.2% on CVL in Top 1. Khan et al. [103] generated a predictor model using Spectral Regression- Kernel Discrim- inant Analysis (SR-KDA) by deploying experiment on IAM, CVL, AHTID/MW database and IFN/ENIT database. They compared their system with existing state of arts and achieved 97.2% on IAM, 99.6% on CVL, 71.6% on AHTID/MW and 76.0% on IFN/ENIT database. Garg et al. [69] classified Gurmukhi dataset of 49,000 samples written 70 persons using four classifiers that are Decision Tree, Random Forest, AdaBoostM1 and Naive Bayes. They achieved highest accuracy of 81.75% using AdaBoostM1 on classification of centroid features. 4.3.3 Deep learning models based classification Deep learning models and neural network approaches are widely used due to the tremendous emergence in image processing, artificial intelligence and pattern recognition. Convolutional Neural Networks are feed-forward artificial neural networks that are widely used for object recognition [140], object detection [139], image tagging [73], ranking scoring [194], speech recognition [143], face recognition [170], handwritten recognition [52] and recognition of digits like MNIST dataset recognition. We’re dwelling in the technology of wearable devices and environmental sensors. Activites are more complex than actions as semantically they’re more representative of a human’s real life styles. There are several machine learning techniques proposed for recognizing the complex activities from sensor data [111–113]. These techniques are playing a vital role in many domains where data is collected from sensor based devices such as smart-phone accelerometers [115]. CNN have produced better recognition rates than other conventional machine learning models. CNN had been learned to optimize and maximize the popular performance measure known as positive data points ranked at the top positions (Pos@Top) [72]. Some of the neural network approaches were also employed in the field of writer identi- fication and verification. Marti et al. [122] employed feed farward neural network on 100 pages samples of IAM dataset written by 20 writer. They used 20 hidden neurons and achieved recoginition rate of 90.7%. Rafiee and Motavalli [138] classified 20 writers database and 5 to 7 lines of farsi script using feed farward neural network and reterived 86.5% accuracy. Adak et al. [4] classified Auto derived parameters using RNN when conducted experiment on Bengali script of 100 samples and reported F-Measure of 43.65%. In multiscript environment, Zois and Anastassopoulos [203] identified 50 group of writers using neural network with three layers of 20 neurons. They attained the identification rate of 97.7% for the English word and 98.6% for the Greek by creating their own private dataset. Zhang et al. [198] deployed RNN with bi directional LSTM for the encoding of random hybrid 10918 Multimedia Tools and Applications (2019) 78:10889–10931 Table 5 Review of writer identification systems using different features and models Year Reference Features Models Dataset Accuracy (%) 2018 Garg et al. [69] Statistical Nave bayes Gurmukhi dataset 70.10 (49,000 samples) 2018 Pandey and Seeja Structural KNN IAM 88.57 [133] 2018 Chahi et al. [43] Structural KNN with hamming distance IAM 88.99 IFN/ENIT 96.47 AHTID/MW 99.53 CVL 98.38 2018 Christlein et al. [51] Model Exemplar SVM ICDAR13 99.6 Based CVL 99.5 KHATT 99.6 2017 Khan et al. [104] Statistical SR-KDA IAM 97.2 AHTID/MW 71.6 CVL 99.6 IFN/ENIT 76.0 2017 Adak et al. [5] Statistical SVM Author generated (100 43.65 Bengali samples) 2017 Kumar and Kaur Statistical SVM IAM 92 [109] 2017 Kumar and Kaur Statistical Neural Network IAM 95 [109] 2017 Venugopal and Statistical SVM IAM online 97.81 Sundaram [182] IBM UB 1 94.37 2017 Zhang et al. [201] Statistical RNN- BLSTM BIT-English 100 BIT-Chinese 99.46 2017 Durou et al. [56] Structural KNN IAM 92 ICFHR- 2012 97.0 2017 He et al. [84] Structural KNN CERUG-CN 93.3 CERUG-EN 95.2 CERUGMIX 98.5 Firemaker 86.2 IAM 89.9 2017 Ahmed et al. [5] Structural Chi- square KURD 94.63 IAM 95.59 ICFHR GRDS 97.12 100 2017 Xing and Qiao in Model CNN IAM 99.01 [189] Based HWDB1.1 93.85 2017 Christlein et al. [48] Statistical SVM ICDAR 89.4 CVL 91.0 KHATT 97.2 2017 Christlein et al. [50] Model Exampler SVM ICDAR2017 (Historical- 84.1 Based WI) 2017 Nasuno and Arai Model Alexnet CNN Author generated (100 90 [129] Based words) 2017 Wu et al. [187] Statistical HMM IAM-OnDB 94.5 2017 Sheikh and Statistical HMM Author generated (70 58.7 Khotanlou [158] Persian samples) 2016 Bertolini et al. [27] Structural SVM QUWI 2016 Hannad and Siddiqi Structural Hamming Distance IFN/ENIT 94.89 [80] IAM 89.54 2016 Garz et al. [71] Structural Naive Bayes IAM 86.9 2016 Yang et al. [192] Model CNN NLPR-Chinease 95.72 Based NLPR- English 98.51 Multimedia Tools and Applications (2019) 78:10889–10931 10919 Table 5 (continued) Year Reference Features Models Dataset Accuracy (%) 2016 Zhu and Wang Structural Manhattan IAM 96.48 [201] HIT-MW 95.44 2015 Xiong et al. [194] Statistical weighted Chi- squared distance ICFHR2012-Latin 94.0 ICDAR2013-Greek 96.2 ICDAR2013-English 94.0 2015 Fiel and Sablatnig Model KNN ICDAR 2013 88.5 [66] Based ICDAR 2011 94.7 CVL 98.3 2015 Abdi and Structural Chi-square IFN/ENIT 90.02 Khemakhem [2] 2015 He et al. in [85] Structural Nearest Neighbor Firemaker 91.1 IAM 89.8 2015 Christlein et al. [49] Model CNN ICDAR 2013 98.9 Based CVL 99.4 2015 Yang et al. [193] Model CNN CASIA-OLHWDB1.0 99.52 Based 2015 Khalifa et al. [102] Structural SR-KDA IAM 98 2014 Kumar et al. [108] Statistical SVM Gurmukhi 91.80 2014 Wu et al. [188] Statistical Manhattan Chi- square IAM 98.5 Firemaker 92.4 HIT-MW 95.4 ICDAR 2011 99.5 ICFHR 2012 98.0 2014 Hu et al. [94] Statistical KNN CASIA Offline DB 2.1 96.25 2014 Newell and Griffin Statistical Nearest Neigh- bour with the IAM 99 [132] Eu- clidean distance ICDAR 2012 93.1 2014 Kumar et al. [109] Structural KNN IAM 86.75 2014 Fecker et al. [63] Statistical Chi- square Islamic Heritage project 92.5 (IHP) KNN K Nearest Neighbor; HD Hamming Distance; NN Neural Network; SVM Support Vector Machine; RNN Recurrent Neural Network; NB Naïve Bayes; CSD Chi-Square Distance; MD Manhattan Distance; ED Euclidean Distance; BLSTM Bidirectional Long Short-Term Memory; HMM Hidden Markov Model; CNN Convolutional Neural Network; WCSD: Weighted Chi-Square Distance; SR-KDA Spectral Regression- Kernel Discriminant Analysis; IAM Institut für Informatik und angewandte Mathematik; IFN/ENIT Institute of Communications Technology/Ecole Nationale dIngenieurs de Tunis; AHTID/MW Arabic Handwritten Text Images Database written by Multiple Writers; KHATT KFUPM Handwritten Arabic Text BIT Biometrics Ideal Test; ICFHR International Conference on Frontiers in Handwriting Recognition; IHP Islamic Heritage Project stroke features and classification. They conducted experiment on English and Chi- nese dataset and retrieved 100% accuracy on English dataset and 99.46% on Chinese dataset while compared with existing online domain systems. Yang et al. [193] deployed CNN for online domain and called their system as Deep WriterID. They evaluated experiment on two datasets of National Lab- oratory of Pattern Recognition (NLPR) and attained the identification rate of 95.72% for Chinese and 98.51% for English. Same authors [192] conducted experiment on CASIA-OLHWDB1.0 dataset to employed deep convolutional network and achieved 99.52% accuracy. Xing and Qiao in [189] and [190] conducted experiment on IAM and HWDB datasets. They classified using CNN model Deep Writer. They achieved 99.01% accuracy on 301 writers of IAM with 4 English alphabets as input. Christlein et al. [49] performed experiment on ICDAR 2013, CVL with 4 million image patches of size 3232. They reported 0.21 absolute mAP. Nasuno and Arai [129] experimented 10920 Multimedia Tools and Applications (2019) 78:10889–10931 on Japanese dataset with 100 kind of words from each 100 writers. AlexNet CNN was employed to train 90 words. They tested 10 words and calculated approximately 90% accuracy. Yang et al. [193] evaluated performance on CASIA-OLHWDB1.0 Chinease dataset. They reported accuracy rate of 99.52% on text line. Performance evaluation of different systems can be look at glance from the Table 5. 5 Open research issues Despite extensive research, the problem still remains open to research due to the variety of challenges it offers. One main challenge is inter-class variation that occurs due to variation in handwriting patterns due to the individual writing style. Writing style depends on various factors like mood, mental caliber, age and situation etc. Likewise, in our daily life, a writer could have different handwriting samples due to the use of diverse writing instruments. This leads to the width of strokes and size of characters from the same writer alterable in different time, which makes the authentication of the writer more challenging and difficult. During acquisition of handwriting images with verity of equipment it is also noted that training and testing samples have not same resolution thus a writing of same person produces different writing styles and the issue of inter-class variation arises. We have noticed that Latin scripts writer identification have received enough attention of researcher and are at mature lever, however, family of Arabic script (i.e. Arabic, Urdu, Persian, etc.) and Chines have not received much attention as a result writer identification results are far from satisfactory performance. It is due to the complexity of these script. For example, mostly researchers’ face issue in Arabic script and accuracy is dejected. As Arabic is a native cursive language due to the jointed characters within the words. Diacritics (that is usually at the top or below of character), overlapping and different represen- tation of words may also exist in Arabic script. Thus the segmentation of Arabic script is difficult. Features selection and representation and the appropriate best practice of text depen- dent and text independent regarding to Arabic script is still an open and hot issue. However, symbolic languages like Bengali, Oriya, and Persian etc. produced increase in accuracy as they have not unique characteristics like Arabic. Similarly passing symbolic and letters to machine learning models are easy as compared to passing the whole line. Text independent methods produce lesser identification rates than text dependent methods. Reason behind this is a text- dependent method operates at the character or word level and gives better results to increase accuracy. Conversely, text- independent methods work on the line or paragraph levels thus segmentation is required and low identification results was achieved. From the discussion of text-dependent and text- independent methods, one can conclude that in general higher identification rates are achievable with the former type but at the cost of the requirement to have same fixed text or human intervention to extract the elements (characters or words) to be compared. Text-independent methods are much more useful and applicable. These methods, however, require a certain minimum amount of text to produce acceptable results. The challenges for writer identification and writer retrieval include the use of different pens, which changes a persons writing style, the physical condition of the writer, distractions like multitasking and noise, and also that the writing style changes with age. The changing of the style with increasing age is not covered by any available data-set and cannot be examined, but makes the identification or retrieval harder for real life data. One of the major challenge is CNN based writer identification system that was rare in literature. A reason might be that the training and test sets of most datasets are disjoint making Multimedia Tools and Applications (2019) 78:10889–10931 10921 it impossible to train a CNN for classification. Deep learning models required sufficient number of instances for better performance. There are many datasets for writer identification having a lot number of classes. But still a problem arises that the data within the class is less. This problem highly effect the performance of deep learning models. Multi-script writer identification is one of the hot research problem. It brings a lot of interest to validate the hypothesis that writer across different scripts are same. This approach gave better results on structural and statistical features but unfortunately deep learning approaches had not been deployed in this domain. 6 Conclusion and future direction In this paper, we provided a comprehensive insight of state of the art of writer identification techniques with an emphasis on the pre-processing, feature ex- traction and classification (conventional machine learning and deep learning based models). The effective implementa- tion of writer identification systems can be applicable in forensic and historical analysis, banks, check processing, signature analysis, graphology, legal documents, ancient manuscripts, digital rights administration, and document analysis methods. The extensive review of the literature has led to identifying several open problems as mentioned in the previous section. Language differ form other languages and have unique characteristics that poses different challenges for each language, thus standard approach does not fit for all i.e. features used for English does not fit for Arabic script due to cursive nature of the script. We noticed that Latin scripts have been widely considered for writer identification however, family of Arabic script (i.e. Arabic, Urdu, Persian, etc.) and Chines have not received that much attention and still far from satisfactory performance. It is due to the complexity and challenges in these scripts. Furthermore, we have notices that there is lack of benchmark evaluation and competitions, different researchers are using different data-set for evaluation. Thus, there is work required to develop benchmark data-set and evaluation mechanism through competitions to compare and evaluate the work in the field of write identification. We have notices that there is still a large room for research to go for new researchers. To highlight these challenges and open research issues, for the future perspective, our study lay down to enumerate few research directions for researchers in the area of writer identification using deep learning techniques. We dis- cussed the notable dataset for handwritten writer identification. However, instances of text in each data-set are less within the class, which result in poor performance using deep learning models. Therefore, very limited work available in the literature. There is a crucial need to develop an unconstrained data-set that contains large number of samples within the class. Next direction is that data Fig. 7 Future directions for researchers 10922 Multimedia Tools and Applications (2019) 78:10889–10931 augmentation techniques can help in increasing the number of instances within the class using existing dataset and will lead to enjoy the deep learning approaches in this domain. Further- more the pre-processing step like normalization, high frequency and contours can also be improved performance of deep learning models. Next, there is a wide room for online writer identification for researchers. As, It is cleared that offline writer recognition is considered as a harder task due to the deficiency of sequential information in handwriting and large intra-class variation. On contrary, online handwriting contains sequential and spatial information. Writing samples are kept as trajectories embodied as time series of two dimensional coordinates. Dynamic features are calculated and used for identification. Different parameters like writing speed, direction of writing, positions of pen tips, velocity, angles and pressure etc. will be extracted. These features result served as spatio- temporal parameter space exemplification of handwriting. Online writer identification is much attractive research with the advancement of information technology and use of smart phone, tablets and like these gadgets. Finally, we are suggesting to explore and investigate different architectures of deep neural networks for writer identification, especially a case of Arabic script. The deep convolution neural network proved itself as a best recognizer in the optical character recognition, hand- writing recognition, speech recognition. Few researchers in the literature had employed deep learning model fro writer identification using text images. There is a need to employ deep learning based modern techniques like transfer learning and fine tuning etc. The Transfer learning or inductive learning is the technique in which the learned knowledge of one problem is stored and apply to the other different related problems (same domain). One can take a pre- trained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch. Another way is to freeze the weights of CNN layers for extracting features and then a linear classifier like SVM use for classification (perform well for problems belong to different domain). At last, we present the future recommendations through graphical representation shown in Fig. 7 that provide better understanding for new researchers in the field of writer identification. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. References 1. Abdelhaleem A, Droby A, Asi A, Kassis M, Al Asam R, El-sanaa J (2017) Wahd: a database for writer identification of arabic historical documents. In: Arabic Script Analysis and Recognition (ASAR), 1st International Workshop on, pp. 64–68. IEEE 2. Abdi MN, Khemakhem M (2015) A model-based approach to offline text-independent Arabic writer identification and verification. Pattern Recognition p. 18901903 3. Abdi MN, Khemakhem M, Ben-Abdallah H (2009) A novel approach for offline Arabic writer identifi- cation based on stroke feature combination. In: Computer and Information Sciences, 2009. ISCIS 2009. 24th International Symposium on, pp. 597–600. IEEE 4. Adak C, Chaudhuri BB, Blumenstein M (2017) Writer identification and verification from intra-variable individual handwriting. arXiv preprint arXiv:1708.03361 5. Ahmed AA, Hasan HR, Hameed FA, Al-Sanjary OI (2017) Writer identification on multi-script hand- written using optimum features. Kurdistan Journal of Applied Research 2(3):178–185 6. Ahmed AA, Sulong G (2014) Arabic writer identification: A review of literature. Journal of Theoretical & Applied Information Technology Multimedia Tools and Applications (2019) 78:10889–10931 10923 7. Al Maadeed S, Ayouby W, Hassaïıne A, Aljaam JM (2012) QUWI: An Arabic and english handwriting dataset for offline writer identification. In: Frontiers in Handwriting Recognition (ICFHR), International Conference on, pp. 746–751. IEEE 8. Alamri H, Sadri J, Suen CY, Nobile N (2008) A novel comprehensive database for Arabic offline handwriting recognition. In: Proceedings of 11th International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 664–669 9. Al-Dmour A, Abu Zitar R (2007) Arabic writer identification based on hybrid spectral– statistical measures. Journal of Experimental & Theoretical Artificial Intelligence 19(4):307–332 10. Al-Ma’adeed S, Elliman D, Higgins CA (2002) A data base for Arabic handwritten text recognition research. In: Frontiers in Handwriting Recognition, Proceedings. Eighth International Workshop on, p. 485–489. IEEE 11. Al-Ma’adeed S, Mohammed E, Al Kassis D, Al-Muslih F (2008) Writer identification using edge-based directional probability distribution features for Arabic words. In: Computer Systems and Applications, AICCSA-08. IEEE/ACS International Conference on, p. 582–590. IEEE 12. Al-Maadeed S, Hassaine A, Bouridane A, Tahir MA (2016) Novel geometric features for offline writer identification. Pattern Anal Applic 19(3):699–708 13. Amaral AMM, Freitas CO, Bortolozzi F (2012) The graphometry applied to writer identification. In: Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV), p. 1. The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp) 14. Amaral AMM, Freitas CO, Bortolozzi F (2013) Multiple graphometric features for writer identification as part of forensic handwriting analysis. In: Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV), p. 1. The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp) 15. Amend K (1980) Handwriting analysis: The complete basic book. New Page Books 16. Arazi B (1977) Handwriting identification by means of run-length measurements. IEEE Trans. Syst., Man and. Cybernetics 7(12):878–881 17. Arazi B (1983) Automatic handwriting identification based on the external properties of the samples. IEEE Transactions on Systems, Man, and Cybernetics 13(4):635–642 18. Asi A, Abdalhaleem A, Fecker D, Märgner V, El-Sana J (2017) On writer identification for Arabic historical manuscripts. International Journal on Document Analysis and Recognition (IJDAR) p. 173–187 19. Awaida S, Mahmoud S (2011) Writer identification of Arabic handwritten digits. Universit¨atsbibliothek Dortmund 20. Awaida, S.M., Mahmoud, S.A. (2012) State of the art in offline writer identification of handwritten text and survey of writer identification of Arabic text. Educational Research and Reviews 4–45 21. Awaida SM, Mahmoud SA (2013) Writer identification of Arabic text using statistical and structural features. Cybernetics and Systems 57–76 22. Baghshah MS, Shouraki SB, Kasaei S (2006) A novel fuzzy classifier using fuzzy LVQ to recognize online Persian handwriting. In: Information and Communication Technologies, 2006. ICTTA’06. 2nd, vol. 1, pp. 1878–1883. IEEE 23. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Computer Vision and Image Understanding 346–359 24. Bensefia A, Paquet T, Heutte L (2003) Information retrieval based writer identification. PRIS, p. 56–63 25. Bensefia A, Paquet T, Heutte L (2005) Handwritten document analysis for automatic writer recognition. ELCVIA: Electronic Letters on Computer Vision and Image Analysis 72–86 26. Bensefia A, Paquet T, Heutte L (2005) A writer identification and verification system. Pattern Recognition Letters 2080–2092 27. Bertolini D, Oliveira LS, Sabourin R (2016) Multi-script writer identification using dissimilarity. In: Pattern Recognition (ICPR), 23rd International Conference on, p. 3025–3030. IEEE 28. Bertolini D, Oliveira LS, Sabourin R (2016) Multi-script writer identification using dissimilarity. In: International Conference on Pattern Recognition (ICPR), p. 3020–3025 29. bin Abdl KM, Hashim SZM (2009) Handwriting identification: a direction review. In: Signal and Image Processing Applications (ICSIPA), 2009 IEEE International Conference on, pp. 459–463. IEEE 30. Bisquerra AF (2009) Writer identification by combination of f Graphical Features in the Framework of old Handwritten Music Scores. Ph.D. thesis, Autonomous University of Barcelona 31. Blankers V, Niels R, Vuurpijl L (2007) Writer identification by means of explainable features: shapes of loop and lead-in strokes. Proc. of BNAIC 17–24 32. Bradford RR, Bradford R (1992) Introduction to handwriting examination and identification. Nelson-Hall Publishers, Chicago 10924 Multimedia Tools and Applications (2019) 78:10889–10931 33. Brink A, Bulacu M, Schomaker L (2008) How much handwritten text is needed for text- independent writer verification and identification. In: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pp. 1– 4. IEEE 34. Bulacu ML (2007) Statistical pattern recognition for automatic writer identification and verification. Ph.D. thesis, University of Groningen 35. Bulacu M, Schomaker L (2003) Writer style from oriented edge fragments. In: International Conference on Computer Analysis of Images and Patterns, pp. 460–469. Springer 36. Bulacu M, Schomaker L (2005) A comparison of clustering methods for writer identification and verification. In: Document Analysis and Recognition, Proceedings. Eighth International Conference on, p. 1275–1279. IEEE 37. Bulacu M, Schomaker L (2006) Combining multiple features for text-independent writer identification and verification. In: Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft 38. Bulacu M, Schomaker L (2007) Automatic handwriting identification on medieval documents. In: Image Analysis and Processing, ICIAP-07. 14th International Conference on, pp. 279–284. IEEE 39. Bulacu M, Schomaker L (2007) Text-independent writer identification and verification using textural and allographic features. IEEE Trans Pattern Anal Mach Intell 29(4):701–717 40. Bulacu M, Schomaker L, Brink A (2007) Text-independent writer identification and verification on offline Arabic handwriting. In: Document Analysis and Recognition (ICDAR- 07), Ninth International Conference on, p. 769–773. IEEE 41. Bulacu M, Schomaker L, Vuurpijl L (2003) Writer identification using edge-based directional features. Writer 1:1 42. Cha SH, Srihari SN (2000) Assessing the authorship confidence of handwritten items. In: Proceedings, p. 42. IEEE 43. Chahi A, Ruichek Y, Touahni R et al (2018) Block wise local binary count for offline text-independent writer identification. Expert Syst Appl 93:1–14 44. Chanda S, Franke K, Pal U (2012) Text independent writer identification for Oriya script. In: Document Analysis Systems (DAS), 10th IAPR International Workshop on, pp. 369–373. IEEE 45. Chanda S, Franke K, Pal U, Wakabayashi T (2010) Text independent writer identification for Bengali script. In: Pattern Recognition (ICPR), 20th International Conference on, p. 2005–2008. IEEE 46. Chapran J (2006) Biometric writer identification: feature analysis and classification. Int J Pattern Recognit Artif Intell 20(04):483–503 47. Chaudhry R, Pant SK (2004) Identification of authorship using lateral palm printa new concept. Forensic Sci Int 141(1):49–57 48. Christlein V, Bernecker D, Honig F, Angelopoulou E (2014) Writer identification and verification using GMM supervectors. In: Applications of Computer Vision (WACV), IEEE Winter Conference on, p. 998– 1005. IEEE 49. Christlein V, Bernecker D, Hönig F, Maier A, Angelopoulou E (2017) Writer identification using GMM supervectors and exemplar-SVMs. Pattern Recogn 63:258–267 50. Christlein V, Bernecker D, Maier A, Angelopoulou E (2015) Offline writer identification using convolutional neural network activation features. In: German Conference on Pattern Recognition, p. 540–552. Springer 51. Christlein V, Gropp M, Fiel S, Maier A (2017) Unsupervised Feature Learning for Writer Identification and Writer Retrieval. arXiv preprint arXiv:1705.09369 52. Christlein V, Maier A (2018) Encoding CNN activations for writer recognition. In: 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 169–174. IEEE 53. Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745 54. Djeddi C, Al-Maadeed S, Gattal A, Siddiqi I, Ennaji A, El Abed H (2016) ICFHR 2016 competition on multi-script writer demographics classification using “QUWI” database. In: Frontiers in Handwriting Recognition (ICFHR), 15th International Conference on, p. 602–606. IEEE 55. Djeddi C, Al-Maadeed S, Gattal A, Siddiqi I, Souici-Meslati L, El Abed H (2015) Icdar2015 competition on multi-script writer identification and gender classification using QUWI database. In: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, p. 1191–1195. IEEE 56. Durou A, Aref I, Al-Maadeed S, Bouridane A, Benkhelifa E (2017) Writer identification approach based on bag of words with obi features. Information Processing & Management 57. Duvoisin RC, Sage J (2001) Parkinson’s disease: A guide for patient and family. Lippincott Williams & Wilkins, Philadelphia 58. Eaton HD (1938) Handwriting a neurological study. California and Western Medicine 61 59. El Abed H, Margner V (2007) The IFN/ENIT-database-a tool to develop Arabic handwriting recognition systems. In: Signal Processing and its Applications (ISSPA 2007), 9th International Symposium on, p. 14. IEEE Multimedia Tools and Applications (2019) 78:10889–10931 10925 60. El Abed H, Märgner V (2011) ICDAR 2009-Arabic handwriting recognition competition. International Journal on Document Analysis and Recognition (IJDAR) 313 61. El-Sherif EA, Abdelazeem S (2007) A two-stage system for Arabic handwritten digit recognition tested on a new large database. Artificial Intelligence and Pattern Recognition 237–242 62. Fairhurst M, Chapran J (2006) Biometric writer identification based on the interdependency between static and dynamic features of handwriting. In: Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition, pp. 505–510 63. Fecker D, Asit A, Märgner V, El-Sana J, Fingscheidt T (2014) Writer identification for historical arabic documents. In: Pattern Recognition (ICPR), 2014 22nd International Conference on, pp. 3050–3055. IEEE 64. Fiel S, Sablatnig R (2010) Writer retrieval and writer identification using local features. In: Document Analysis Systems (DAS), 10th IAPR International Workshop on, pp. 145–149. IEEE 65. Fiel S, Sablatnig R (2013) Writer identification and writer retrieval using the fisher vector on visual vocabularies. In: Document Analysis and Recognition (ICDAR), 12th International Conference on, p. 545–549. IEEE 66. Fiel, S., Sablatnig, R. (2015) Writer identification and retrieval using a convolutional neural network. In: International Conference on Computer Analysis of Images and Patterns, p. 26–37. Springer 67. Fornes A, Dutta A, Gordo A, Lladós J (2012) CVC-MUSCIMA: A database of handwritten music score images for writer identification and staff removal. International Journal on Document Analysis and Recognition manuscript p. 243–251 68. Fornes A, Llados J, Sanchez G, Bunke H (2012) Writer identification in old handwritten music scores. In: Pattern Recognition and Signal Processing in Archaeometry: Mathematical and Computational Solutions for Archaeology, pp. 27–63. IGI Global 69. Garg NK, Kumar M, et al (2018) Writer identification system for handwritten Gurmukhi characters: Study of different feature-classifier combinations. In: Proceedings of International Conference on Computational Intelligence and Data Engineering, pp. 125–131. Springer 70. Gargouri M, Kanoun S, Ogier JM (2013) Text-independent writer identification on online Arabic handwriting. In: Document Analysis and Recognition (ICDAR), 12th International Conference on, pp. 428–432. IEEE 71. Garz A, Würsch M, Fischer A, Ingold R (2016) Simple and fast geometrical descriptors for writer identification. Electronic Imaging 2016(17):1–12 72. Geng Y, Liang RZ, Li W, Wang J, Liang G, Xu C, Wang JY (2016) Learning convolutional neural network to maximize pos@ top performance measure. arXiv preprint arXiv:1609.08417 73. Geng Y, Zhang G, Li W, Gu Y, Liang RZ, Liang G, Wang J, Wu Y, Patil N, Wang JY (2017) A novel image tag completion method based on convolutional neural transformation. In: International Conference on Artificial Neural Networks, pp. 539–77. Springer 74. Ghiasi G, Safabakhsh R (2013) Offline text-independent writer identification using codebook and efficient code extraction methods. Image Vis Comput 31:379–391 75. Gibbons M, Yoon S, Cha SH, Tappert C (2005) Evaluation of biometric identification in open systems. In: International Conference on Audio and Video Based Biometric Person Authentication, p. 823–831. Springer 76. Grosicki E, Carre M, Brodin JM, Geoffrois E (2008) Rimes evaluation campaign for handwritten mail processing. In: 11th International Conference on Frontiers in Handwriting Recognition, p. 16. Concordia University 77. Hangai S, Yamanaka S, Hanamoto T (2000) Online signature verification based on altitude and direction of pen movement. In: Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on, vol. 1, pp. 489–492. IEEE 78. Hannad Y, Siddiqi I, El Kettani MEY (2015) Arabic writer identification using local binary patterns (LBP) of handwritten fragments. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 237– 244. Springer 79. Hannad Y, Siddiqi I, El Kettani MEY (2016) Writer identification using texture descriptors of handwritten fragments. Expert Systems with Applications 1422 80. Hannad Y, Siddiqi I, El Kettani MEY (2016) Writer identification using texture descriptors of handwritten fragments. Expert Syst Appl 47:14–22 81. Hassaïıne A, Al-Maadeed S, Bouridane A (2012) A set of geometrical features for writer identification. In: International Conference on Neural Information Processing, p. 584–591. Springer 82. Hassaïne A, Al Maadeed S, Aljaam J, Jaoua A (2013) ICDAR 2013 competition on gender prediction from handwriting. In: Document Analysis and Recognition (ICDAR), 12th International Conference on, pp. 1417–1421. IEEE 83. He Z, Fang B, Du J, Tang YY, You X (2005) A novel method for offline handwriting- based writer identification. In: Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pp. 242–246. IEEE 10926 Multimedia Tools and Applications (2019) 78:10889–10931 84. He S, Schomaker L (2017) Writer identification using curvature-free features. Pattern Recogn 63:451–464 85. He Z, Tang Y (2004) Chinese handwriting-based writer identification by texture analysis. In: Machine Learning and Cybernetics, Proceedings of 2004 International Conference on, vol. 6, pp. 3488–3491. IEEE 86. He Z, Tang YY, You X (2005) A contourlet-based method for writer identification. In: Systems, Man and Cybernetics, 2005 IEEE International Conference on, vol. 1, pp. 364–368. IEEE 87. He S, Wiering M, Schomaker L (2015) Junction detection in handwritten documents and its application to writer identification. Pattern Recognition p. 40364048 88. He Z, You X, Tang YY (2008) Writer identification of Chinese handwriting documents using hidden Markov tree model. Pattern Recogn 41(4):1295–1307 89. He Z, You X, Zhou L, Cheung Y, Du J (2010) Writer identification using fractal dimension of wavelet subbands in Gabor domain. Integrated Computer-Aided Engineering 17(2):157–165 90. Helli B, Moghadam ME (2008) Persian writer identification using extended Gabor filter. In: International Conference Image Analysis and Recognition, p. 579–586. Springer 91. Helli B, Moghaddam ME (2008) A text-independent Persian writer identification system using LCS based classifier. In: Signal Processing and Information Technology, ISSPIT-08. IEEE International Symposium on, p. 203206. IEEE 92. Helli B, Moghaddam ME (2009) A writer identification method based on XGabor and LCS. IEICE Electronics Express 6(10):623–629 93. Hertel C, Bunke H (2003) A set of novel features for writer identification. In: International Conference on Audio and Video Based Biometric Person Authentication, pp. 679–687. Springer 94. Hu Y, Yang W, Chen Y (2014) Bag of features approach for offline text-independent Chinese writer identification. In: Image Processing (ICIP), IEEE International Conference on, p. 26092613. IEEE 95. Idicula SM (2011) A survey on writer identification schemes. Writer 15 96. Imdad A, Bres S, Eglin V, Rivero-Moreno C, Emptoz H (2007) Writer identification using steered hermite features and SVM. In: Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, vol. 2, pp. 839–843. IEEE 97. Jain R, Doermann D (2011) Offline writer identification using k-adjacent segments. In: Document Analysis and Recognition (ICDAR), International Conference on, p. 769–773. IEEE 98. Jin W, Wang Y, Tan T (2005) Text-independent writer identification based on fusion of dynamic and static features. In: Advances in Biometric Person Authentication, pp. 197–204. Springer 99. Kamal P, Rahman F, Mustafiz S (2014) A robust authentication system handwritten documents using local features for writer identification. Journal of Computing Science and Engineering 8(1):11–16 100. Kameya H, Mori S, Oka R (2003) Figure-based writer verification by matching between an arbitrary part of registered sequence and an input sequence extracted from online handwritten figures. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition Volume 2, p. 985. IEEE Computer Society 101. Khalid S, Naqvi U, Siddiqi I (2015) Framework for human identification through offline handwritten documents. In: Computer, Communications, and Control Technology (I4CT), 2015 International Conference on, p. 54–58. IEEE 102. Khalifa E, Al-Maadeed S, Tahir MA, Bouridane A, Jamshed A (2015) Offline writer identification using an ensemble of grapheme codebook features. Pattern Recogn Lett 59:18–25 103. Khan FA, Tahir MA, Khelifi F, Bouridane A, Almotaeryi R (2017) Robust offline text independent writer identification using bagged discrete cosine transform features. Expert Syst Appl 71:404–415 104. Kharma N, Ahmed M, Ward R (1999) A new comprehensive database of handwritten Arabic words, numbers, and signatures used for OCR testing. In: Electrical and Computer Engineering, IEEE Canadian Conference on, vol. 2, p. 766–768. IEEE 105. Kleber F, Fiel S, Diem M, Sablatnig R (2013) CVL-database: An offline database for writer retrieval, writer identification and word spotting. In: Document Analysis and Recognition (ICDAR), 2013 12th International Conference on, p. 560–564. IEEE 106. Kozinets B, Lantsman R, Sokolov B, Yakubovich V (1967) Identification and differentiation of handwritings with the help of electronic computers (opozanie i differentsyatsiya pocherkov pri pomoshchi elektronnovychislitelnykh mashin). Tech. rep., Foreign Technology Div Wright- Patterson AFB Ohio 107. Kumar R, Chanda B, Sharma J (2014) A novel sparse model based forensic writer identification. Pattern Recogn Lett 35:105–112 108. Kumar M, Jindal M, Sharma R (2014) A novel hierarchical technique for offline handwritten Gurmukhi character recognition. National Academy Science Letters 37(6):567–572 109. Kumar R, Kaur M (2017) A character based handwritten identification using neural network and SVM. International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET) Multimedia Tools and Applications (2019) 78:10889–10931 10927 110. Leedham G, Chachra S (2003) Writer identification using innovative binarised features of handwritten numerals. In: Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on, pp. 413–416. IEEE 111. Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. AAAI 30:1266–1272 112. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: Recognizing complex activities from sensor data. IJCAI 2015:1617–1623 113. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115 114. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110 115. Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimedia Tools and Applications 76(8):10,701–10,719 116. Mahmoud SA, Ahmad I, Al-Khatib WG, Alshayeb M, Parvez MT, Maargner V, Fink GA (2014) Khatt: An open arabic offline handwritten text database. Pattern Recogn 47(3):1096–1112 117. Margner V, El Abed H (2007) ICDAR 2007 - Arabic handwriting recognition competition. In: Document Analysis and Recognition (ICDAR), 9th International Conference on, p. 1274–1278 118. Margner V, El Abed H (2007) ICFHR 2010 - Arabic handwriting recognition competition. In: Frontiers in Handwriting Recognition (ICFHR), 12th International Conference on, p. 1274–1278 119. Margner V, El Abed H (2011) ICDAR 2011-Arabic handwriting recognition competition. In: Document Analysis and Recognition (ICDAR), International Conference on, p. 1444–1448. IEEE 120. Margner V, Pechwitz M, El Abed H (2005) ICDAR 2005 Arabic handwriting recognition competition. In: Document Analysis and Recognition (ICDAR), International Conference on, p. 7074 121. Marti UV, Bunke H (1999) A full english sentence database for off-line handwriting recognition. In: Document Analysis and Recognition, 1999. ICDAR’99. Proceedings of the Fifth International Conference on, p. 705–708. IEEE 122. Marti UV, Messerli R, Bunke H (2001) Writer identification using text line based features. In: Document Analysis and Recognition, Proceedings. Sixth International Conference on, pp. 101–105. IEEE 123. Miller JJ, Patterson RB, Gantz DT, Saunders CP, Walch MA, Buscaglia J (2017) A set of handwriting features for use in automated writer identification. J Forensic Sci 62(3):722–734 124. MNISTWebsite (2018) The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/ 125. Moghaddam ME, et al (2009) A persian writer identification method based on gradient features and neural networks. In: Image and Signal Processing, CISP’09. 2nd International Congress on, p. 14. IEEE 126. Moghaddam ME, et al (2009) Text-independent Persian writer identification using fuzzy clustering approach. In: Information Management and Engineering, 2009. ICIME’09. International Conference on, pp. 728–731. IEEE 127. Nakamura Y, Kidode M (2005) Individuality analysis of online kanji handwriting. In: Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pp. 620–624. IEEE 128. Namboodiri A, Gupta S (2006) Text independent writer identification from online handwriting. In: Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft 129. Nasuno R, Arai S (2017) Writer identification for offline japanese handwritten character using convolutional neural network. In: Proceedings of the 5th IIAE(Institute of Industrial Applications Engineers) International Conference on Intelligent Systems and Image Processing, pp. 94–97 130. Nejad F, Rahmati M (2007) A new method for writer identification and verification based on Farsi/Arabic handwritten texts. In: Document Analysis and Recognition, Ninth International Conference on, vol. 2, pp. 829–833. IEEE 131. Newell AJ, Griffin LD (2011) Natural image character recognition using oriented basic image features. In: Digital Image Computing Techniques and Applications (DICTA), 2011 International Conference on, pp. 191–196. IEEE 132. Newell AJ, Griffin LD (2014) Writer identification using oriented basic image features and the delta encoding. Pattern Recogn 47(6):2255–2265 133. Pandey P, Seeja K (2018) Forensic writer identification with projection profile representation of graph- emes. In: Proceedings of First International Conference on Smart System, Innovations and Computing, pp. 129–136. Springer 134. Pechwitz M, Maddouri SS, Mäargner V, Ellouze N, Amiri H, et al (2002) IFN/ENIT-database of handwritten Arabic words. In: Proc. of CIFED, p. 127136. Citeseer 135. Pervouchine V, Leedham G (2007) Extraction and analysis of forensic document examiner features used for writer identification. Pattern Recogn 40(3):1004–1013 136. Plamondon R, Lorette G (1989) Automatic signature verification and writer identification-the state of the art. Pattern Recogn 22(2):107–131 10928 Multimedia Tools and Applications (2019) 78:10889–10931 137. Plamondon R, Srihari SN (2000) Online and offline handwriting recognition: a comprehensive survey. CIEEE Transactions on Pattern Analysis and Machine Intelligence 63–84 138. Rafiee A, Motavalli H (2007) Offline writer recognition for farsi text. In: Artificial Intelligence Special Session, 2007. MICAI 2007. Sixth Mexican International Conference on, pp. 193–197. IEEE 139. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(6):1137–1149 140. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252 141. Said H, Peake G, Tan T, Baker KD (1998) Writer identification from non-uniformly skewed handwriting images. BMVC 110 142. Said HE, Tan TN, Baker KD (2000) Personal identification based on handwriting. Pattern Recognition 149–160 143. Sainath TN, Mohamed AR, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for LVCSR. In: Acoustics, speech and signal processing (ICASSP), IEEE international conference on, pp. 8614–8618. IEEE 144. Sas J (2006) Handwriting recognition accuracy improvement by author identification. In: International Conference on Artificial Intelligence and Soft Computing, pp. 682–691. Springer 145. Schlapbach A, Bunke H (2004) Offline handwriting identification using hmm based recognizers. In: Pattern Recognition (ICPR-04), Proceedings of the 17th International Conference on, p. 654–658. IEEE 146. Schlapbach A, Bunke H (2004) Using hmm based recognizers for writer identification and verification. In: Frontiers in Handwriting Recognition (IWFHR-9), Ninth International Workshop on, p. 167–172. IEEE 147. Schlapbach A, Bunke H (2005) Writer identification using an hmm-based handwriting recognition system: to normalize the input or not. In: Proc. 12th Conf. of the Int. Graphonomics Society, pp. 138–142 148. Schlapbach A, Bunke H (2006) Offline writer identification using Gaussian mixture models. In: Pattern Recognition (ICPR-06), 18th International Conference on, vol. 3, pp. 992–995. IEEE 149. Schlapbach A, Bunke H (2007) A writer identification and verification system using hmm based recognizers. Pattern Anal Applic 10(1):33–43 150. Schlapbach A, Kilchherr V, Bunke H (2005) Improving writer identification by means of feature selection and extraction. In: Document Analysis and Recognition, Proceedings. Eighth International Conference on, p. 131–135. IEEE 151. Schlapbach A, Liwicki M, Bunke H (2008) A writer identification system for online whiteboard data. Pattern Recogn 41(7):2381–2397 152. Schomaker L (2007) Advances in Writer Identification and Verification. In: Int. Conf. on Document Analysis and Recognition, p. 1268–1273 153. Schomaker L, Bulacu M (2004) Automatic writer identification using connected-component contours and edge-based features of uppercase western script. IEEE Trans Pattern Anal Mach Intell 26(6):787–798 154. Schomaker L, Vuurpijl L (2000) Forensic writer identification: A benchmark data set and a comparison of two systems. [Internal Report for the Netherlands Forensic Institute] 155. Seropian A, Grimaldi M, Vincent N (2003) Writer identification based on the fractal construction of a reference base. ICDAR 1163–1167 156. Shahabi F, Rahmati M (2009) A new method for writer identification of handwritten farsi documents. In: 10th International Conference on Document Analysis and Recognition, p. 426430. IEEE 157. Sharma MK, Dhaka VP (2015) Offline scripting-free author identification based on speeded-up robust features. International Journal on Document Analysis and Recognition (IJDAR) p. 303–316 158. Sheikh A, Khotanlou H (2017) Writer identity recognition and confirmation using persian handwritten texts. International Journal of Advances in Applied Sciences 6(2):98–105 159. Shen C, Ruan XG, Mao TL (2002) Writer identification using Gabor wavelet. In: Intelligent Control and Automation, Proceedings of the 4th World Congress on, vol. 3, pp. 2061–2064. IEEE 160. Siddiqi I (2009) Classification of handwritten documents: writer recognition. Ph.D. thesis, Universit Paris Descartes UFR Mathmatiques et Informatique 161. Siddiqi I, Vincent N (2007) Writer identification in handwritten documents. In: Document analysis and recognition, ninth international conference on, p. 108–112. IEEE 162. Siddiqi I, Vincent N (2008) Combining global and local features for writer identification. In: Proceedings of the 11. Int. Conference on Frontiers in Handwriting Recognition, Montreal 163. Siddiqi I, Vincent N (2009) Combining contour based orientation and curvature features for writer recognition. In: International Conference on Computer Analysis of Images and Patterns, pp. 245–252. Springer 164. Siddiqi I, Vincent N (2009) A set of chain code based features for writer recognition. In: Document Analysis and Recognition, ICDAR’09. 10th International Conference on, pp. 981–985. IEEE 165. Siddiqi I, Vincent N (2010) Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features. Pattern Recogn 43(11):3853–3865 Multimedia Tools and Applications (2019) 78:10889–10931 10929 166. Srihari SN, Cha S, Arora H, Lee S (2002) Individuality of handwriting. J Forensic Sci 117 167. Srihari SN, Tomai CI, Zhang B, Lee S (2003) Individuality of numerals. ICDAR 3:1096–1100 168. Steinherz T, Rivlin E, Intrator N (1999) A survey on offline cursive word recognition. International Journal on Document Analysis and Recognition 90–110 169. Strassel S (2009) Linguistic resources for Arabic handwriting recognition. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt 170. Sun Y, Liang D, Wang X, Tang X (2015) Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873 171. Sutanto PJ, Leedham G, Pervouchine V (2003) Study of the consistency of some discriminatory features used by document examiners in the analysis of handwritten letter’a’. In: Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on, pp. 1091–1095. IEEE 172. Tan T (1992) Texture feature extraction via visual cortical channel modelling. In: Pattern Recognition, Image, Speech and Signal Analysis, Proceedings, 11th IAPR International Conference on, pp. 607–610. IEEE 173. Tan T (1998) Rotation invariant texture features and their use in automatic script identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 751756 174. Tan GX, Viard-Gaudin C, Kot AC (2009) Automatic writer identification framework for online handwrit- ten documents using character prototypes. Pattern Recognition 33133323 175. Tang Y, Bu W, Wu X (2014) Text-independent writer identification using improved structural features. In: Chinese Conference on Biometric Recognition, p. 404411. Springer 176. Tang Y, Wu X, Bu W (2013) Offline text-independent writer identification using stroke fragment and contour based features. In: Biometrics (ICB), International Conference on, pp. 1–6. IEEE 177. Thumwarin P, Matsuura T (2004) Online writer recognition for Thai based on velocity of barycenter of pen-point movement. In: Image Processing, 2004. ICIP’04. 2004 International Conference on, vol. 2, pp. 889–892. IEEE 178. Timofte R, Van Gool LJ (2012) A training-free classification framework for textures, writers, and materials. BMVC 13:14 179. Tomai CI, Zhang B, Srihari SN (2004) Discriminatory power of handwritten words for writer recognition. In: Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2, pp. 638–641. IEEE 180. Ubul K, Hamdulla A, Aysa A, Raxidin A, Mahmut R (2008) Research on uyghur offline handwriting- based writer identification. In: Signal Processing, 2008. ICSP-08. 9th International Conference on, p. 1656–1659. IEEE 181. Ünlü A, Brause R, Krakow K (2006) Handwriting analysis for diagnosis and prognosis of parkinson's disease. In: International Symposium on Biological and Medical Data Analysis, pp. 441–450. Springer 182. Venugopal V, Sundaram S (2017) An online writer identification system using regression based feature normalization and codebook descriptors. Expert Syst Appl 72:196–206 183. Wang X, Ding X, Liu H (2003) Writer identification using directional element features and linear transform. In: null, p. 9–42. IEEE 184. Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Pattern Recognition (ICPR), 21st International Conference on, pp. 3304–3308. IEEE 185. Wen J, Fang B, Chen J, Tang Y, Chen H (2012) Fragmented edge structure coding for chinese writer identification. Neurocomputing 4551 186. Woodard J, Lancaster M, Kundu A, Ruiz D, Ryan J (2010) Writer recognition of arabic text by generative local features. In: Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on, p. 1–7. IEEE 187. Wu Y, Lu H, Zhang Z (2017) Text-independent online writer identification using hidden markov models. IEICE Trans Inf Syst 100(2):332–339 188. Wu X, Tang Y, Bu W (2014) Offline text-independent writer identification based on scale invariant feature transform. IEEE Transactions on Information Forensics and Security 526–536 189. Xing L, Qiao Y (2016) DeepWriter: A Multi-Stream Deep CNN for Text independent Writer Identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589 190. Xiong YJ, Wen Y, Wang, PS, Lu Y (2015) Text-independent writer identification using sift descriptor and contour-directional feature. In: Document Analysis and Recognition (ICDAR), 13th International Conference on, p. 91–95. IEEE 191. Yakopcic C, Alom MZ, Taha TM (2016) Memristor crossbar deep network implementation based on a convolutional neural network. 2016 International Joint Conference on Neural Networks (IJCNN) pp. 963–9 @articleYakopcic2016MemristorCD, title=Memristor crossbar deep network implementation based on a Convolutional neural network, author=Chris Yakopcic and Md. Zahangir Alom and Tarek M. Taha, journal=2016 International Joint Conference on Neural Networks (IJCNN), year=2016, pages=963–970 70 10930 Multimedia Tools and Applications (2019) 78:10889–10931 192. Yang W, Jin L, Liu M (2015) Chinese character-level writer identification using path signature feature, dropstroke and deep CNN. In: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 546–550. IEEE 193. Yang W, Jin L, Liu M (2016) DeepWriterID: An End-to-end Online Text-independent Writer Identification System. IEEE Intell. Syst. 45–53 194. Zhang J, He Z, Cheung YM, You X (2009) Writer identification using a hybrid method combining Gabor wavelet and mesh fractal dimension. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 535–542. Springer 195. Zhang G, Liang G, Li W, Fang J, Wang J, Geng Y, Wang JY (2017) Learning convolutional ranking-score function by query preference regularization. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 1–8. Springer 196. Zhang B, Srihari SN (2003) Analysis of handwriting individuality using word features. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition Volume 2, p. 1142. IEEE Computer Society 197. Zhang B, Srihari SN, Lee S (2003) Individuality of handwritten characters. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition Volume 2, p. 1086. IEEE Computer Society 198. Zhang XY, Xie GS, Liu CL, Bengio Y (2017) End-to-end online writer identification with recurrent neural network. IEEE Transactions on Human-Machine Systems 47(2):285–292 199. Zhu Y, Tan T, Wang Y (2000) Biometric personal identification based on handwriting. In: Pattern Recognition, Proceedings. 15th International Conference on, vol. 2, pp. 797–800. IEEE 200. Zhu Y, Tan T, Wang Y (2001) Font recognition based on global texture analysis. IEEE Transactions on pattern analysis and machine intelligence p. 1192–1200 201. Zhu Y, Wang Y (2016) An offline text-independent writer identification system with sae feature extraction. In: Progress in Informatics and Computing (PIC), 2016 International Conference on, pp. 432–436. IEEE 202. Zimmermann K, Varady M (1985) Handwriter identification from one-bit quantized pressure patterns. Pattern Recogn 18(1):63–72 203. Zois EN, Anastassopoulos V (2000) Morphological waveform coding for writer identification. Pattern Recogn 33(3):38 Arshia Rehman is BSCS student of GGPGC No.1, Abbottabad, Higher Education Department of Government of Khyber-Pakhtunkhwa, Pakistan and works as a Research Assistant under the supervision of Dr. Saeeda Naz at GGPGC No.1, Abbottabad. Her areas of interest are Document Image Understanding, Machine Learning and Multimedia. Saeeda Naz an Assistant Professor by designation and Head of Computer Science Department at GGPGC No.1, Abbottabad, Higher Education Department of Government of Khyber-Pakhtunkhwa, Pakistan, since 2008. She did her Ph.D. in Computer Science from Hazara University, Department of Information Technology, Mansehra, Pakistan. She has published five book chapters and more than 40 papers in peer reviewed national and international conferences and journals. Her areas of interest are Optical Character Recognition, Pattern Recog- nition, Machine Learning, Medical Imaging and Natural Language Processing and Multimedia. Multimedia Tools and Applications (2019) 78:10889–10931 10931 Muhammad Imran Razzak is working with Advanced Analytics Institute, University of Technology, Sydney, Australia. Previously, he was with College of Public health and Health Informatics, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia. His research philosophy is to endeavor the use of deep learning methods to investigate practical problems in image analysis and data analysis with special emphasis to healthcare industry. During his research career, he has developed and delivered several research projects successfully. He is the inventor of one Patent and author of more than 70 papers in well reputed journals and conferences. He has awarded as Young Researcher-2015 NGHA, Saudi Arabia, one of the pretigious award, based on his research contributions and “Best Researcher” during his stay at CoEIA, King Saud University. He has secured several research grants. He also serves as an associate editor for reputable international journals such as PlosOne, IEEE Access, IJIIP, IJIP. He has been part of dozens of conferences in various capacities such as Chair, Co-Chiar, scientific committee.

(PDF) Writer identification using machine learning approaches: a comprehensive review