King Fahd Univeristy Arabic Font Database (KAFD)
Benchmarking databases are very important for AFR research. They are an essential requirement for the development, evaluation, and comparison of different Arabic fonr recognition (AFR) and Optical character recognition (OCR) techniques.
The lack of a benchmarking database for AFR systems resulted in researchers developing their own data.
These datasets are limited in the number of fonts, styles, and scanning resolutions.
KAFD is a multi-font, multi-size, multi-style, and multi-resolution Arabic text database.
It is a freely available and comprehensive database, containing different resolutions (100 dpi, 200 dpi, 300 dpi, and 600 dpi) and in two forms (
Page and
Line).
KAFD consists of
forty Arabic fonts. Each font in this database consists of its unique text.
Each font consists of ten font sizes.
Each font size consists of four font styles. KAFD consists of
(2,691,092) text images.
KAFD database is organized into three sets: Training, Testing, and Validation. The structure of KAFD database is shown in Figure 1.
Fonts, sizes, styles, and resolutions
Arabic texts are collected from different subjects like religious, medicine, science, history, and from many sources. The used texts cover all the shapes of Arabic characters. In addition, it contains names, places, cities, numbers, etc. Arabic text that is used for each font in this database is different (unique) from the texts used for other fonts. After collecting the texts, we constructed the forty fonts as follows:-
- The collected texts are printed using (40) fonts listed in Table 1.
- Each font consists of ten sizes (8, 9, 10, 11, 12, 14, 16, 18, 20, and 24 points). The text for each size is identical to other sizes for the same font and is different from other fonts.
- For each size, four font styles are used (Normal, Bold, Italic, and a combination of Bold and Italic). These styles cover the different writing styles in Arabic documents.
- For each font style, three disjoint sets are constructed (60% for Training, 20% for Testing, and Validation each).
- The number of pages in each set starts by ten pages for 8 points size (6 Training, 2 Testing, 2 Validation).
This number of pages increases as the font sizes increase. The first twenty pages (12 Training, 4 testing, and 4 Validation) are used for sizes larger than 12 points. The total number of printed pages is 28,904 for each resolution.
- The forty fonts are scanned using four resolutions (100 dpi, 200 dpi, 300 dpi, and 600 dpi)
|
Figure 1 KAFD structure
|
Table 1. KAFD fonts
1
|
Advertising Bold
|
11
|
Arial Unicode MS
|
21
|
MaghribiAssile
|
31
|
SC Alyermook
|
2
|
AGA Granada Regular
|
12
|
Arabic Transparent
|
22
|
Microsoft Sans Serif
|
32
|
SC Dubai
|
3
|
AGA Kaleelah Regular
|
13
|
Courier New
|
23
|
Microsoft Ughur
|
33
|
SC Gulf
|
4
|
Akhbar
|
14
|
Deco Type Naskh
|
24
|
Midan
|
34
|
SC Ouhod
|
5
|
Al-Qairwan
|
15
|
Deco Type Thuluth
|
25
|
Motken Unicode Hor
|
35
|
Segore UI
|
6
|
Al-Mohand
|
16
|
Diwani Letter
|
26
|
Nawel
|
36
|
Simplified Arabic
|
7
|
Andalus
|
17
|
FreeHand
|
27
|
Pen Kufi
|
37
|
Tahoma
|
8
|
Arabic Typesetting
|
18
|
Lotus Linotype
|
28
|
Quran_2
|
38
|
Times New Roman
|
9
|
Arabswell
|
19
|
Hadeel
|
29
|
Rateb Lotus
|
39
|
Traditional Arabic
|
10
|
Arial
|
20
|
M Unicode Sara
|
30
|
Rekaa
|
40
|
Zarnew
|
Database Construction Process
After collecting and organzing the texts of the (40) fonts, KAFD database is printed using HP Laser jet 600 M601 printer with a print resolution of 1200 x 1200 dpi. The printed pages of KAFD database are scanned using Ricoh IS760D scanner. Pages are scanned in grayscale.
They are scanned in 100 dpi, 200 dpi, 300 dpi and 600 dpi resolutions. This process resulted in (115,068) page level images for all resolutions (28,767 page images per resolution)
All text images are segmented into lines and ground truth files for each page and line are built.
Segmentation enables the researchers to use the database at the page and line levels.
This process resulted in (2,576,024) line images (644,006 line images per resolution).
The costruction process of KAFD database is shown in Figure 2.
Figure 2 KAFD high level implementation process
For further information
- The authors would like to acknowledge the support provided
by King Abdul-Aziz City for Science and Technology (KACST) for
funding this work under Project no. AT-30–53 through King Fahd
University of Petroleum & Minerals (KFUPM).