King Fahd Univeristy Arabic Font Database (KAFD)

Benchmarking databases are very important for AFR research. They are an essential requirement for the development, evaluation, and comparison of different Arabic fonr recognition (AFR) and Optical character recognition (OCR) techniques. The lack of a benchmarking database for AFR systems resulted in researchers developing their own data. These datasets are limited in the number of fonts, styles, and scanning resolutions. KAFD is a multi-font, multi-size, multi-style, and multi-resolution Arabic text database. It is a freely available and comprehensive database, containing different resolutions (100 dpi, 200 dpi, 300 dpi, and 600 dpi) and in two forms (Page and Line). KAFD consists of forty Arabic fonts. Each font in this database consists of its unique text. Each font consists of ten font sizes. Each font size consists of four font styles. KAFD consists of (2,691,092) text images. KAFD database is organized into three sets: Training, Testing, and Validation. The structure of KAFD database is shown in Figure 1.
Fonts, sizes, styles, and resolutions

Arabic texts are collected from different subjects like religious, medicine, science, history, and from many sources. The used texts cover all the shapes of Arabic characters. In addition, it contains names, places, cities, numbers, etc. Arabic text that is used for each font in this database is different (unique) from the texts used for other fonts. After collecting the texts, we constructed the forty fonts as follows:-

  • The collected texts are printed using (40) fonts listed in Table 1.
  • Each font consists of ten sizes (8, 9, 10, 11, 12, 14, 16, 18, 20, and 24 points). The text for each size is identical to other sizes for the same font and is different from other fonts.
  • For each size, four font styles are used (Normal, Bold, Italic, and a combination of Bold and Italic). These styles cover the different writing styles in Arabic documents.
  • For each font style, three disjoint sets are constructed (60% for Training, 20% for Testing, and Validation each).
  • The number of pages in each set starts by ten pages for 8 points size (6 Training, 2 Testing, 2 Validation). This number of pages increases as the font sizes increase. The first twenty pages (12 Training, 4 testing, and 4 Validation) are used for sizes larger than 12 points. The total number of printed pages is 28,904 for each resolution.
  • The forty fonts are scanned using four resolutions (100 dpi, 200 dpi, 300 dpi, and 600 dpi)
Figure 1 KAFD structure
Table 1. KAFD fonts
S.N Font Name S.N Font Name S.N Font Name S.N Font Name
1 Advertising Bold 11 Arial Unicode MS 21 MaghribiAssile 31 SC Alyermook
2 AGA Granada Regular 12 Arabic Transparent 22 Microsoft Sans Serif 32 SC Dubai
3 AGA Kaleelah Regular 13 Courier New 23 Microsoft Ughur 33 SC Gulf
4 Akhbar 14 Deco Type Naskh 24 Midan 34 SC Ouhod
5 Al-Qairwan 15 Deco Type Thuluth 25 Motken Unicode Hor 35 Segore UI
6 Al-Mohand 16 Diwani Letter 26 Nawel 36 Simplified Arabic
7 Andalus 17 FreeHand 27 Pen Kufi 37 Tahoma
8 Arabic Typesetting 18 Lotus Linotype 28 Quran_2 38 Times New Roman
9 Arabswell 19 Hadeel 29 Rateb Lotus 39 Traditional Arabic
10 Arial 20 M Unicode Sara 30 Rekaa 40 Zarnew

Database Construction Process

After collecting and organzing the texts of the (40) fonts, KAFD database is printed using HP Laser jet 600 M601 printer with a print resolution of 1200 x 1200 dpi. The printed pages of KAFD database are scanned using Ricoh IS760D scanner. Pages are scanned in grayscale. They are scanned in 100 dpi, 200 dpi, 300 dpi and 600 dpi resolutions. This process resulted in (115,068) page level images for all resolutions (28,767 page images per resolution)

All text images are segmented into lines and ground truth files for each page and line are built. Segmentation enables the researchers to use the database at the page and line levels. This process resulted in (2,576,024) line images (644,006 line images per resolution). The costruction process of KAFD database is shown in Figure 2.
Figure 2 KAFD high level implementation process

For further information

  • The authors would like to acknowledge the support provided by King Abdul-Aziz City for Science and Technology (KACST) for funding this work under Project no. AT-30–53 through King Fahd University of Petroleum & Minerals (KFUPM).

Copyright 2013