Results and Analysis
The Language Recognition Program was evaluated by running scans of 200
text excerpts. The current version of the program supports four
languages: English, French, Spanish, and German. Fifty text excerpts
were chosen for each language from various online sources such as
Project Gutenberg and Wikipedia. The results of the scans are then
compared to the actual languages that the excerpts are written in.
Accuracy ratings were calculated from these results. The results of the
evaluation phase are shown below in a table.
|
Actual languages of the text excerpts |
Number of text excerpts attempted |
Number of times the program correctly recognized the language |
Accuracy ratings |
|
English |
50 |
50 |
50 / 50 = 100% |
|
French |
50 |
50 |
50 / 50 = 100% |
|
Spanish |
50 |
49 |
49 / 50 = 98% |
|
German |
50 |
50 |
50 / 50 = 100% |
A wide variety of text excerpts were chosen. The excerpts were about
various topics and this was used to see if the program could handle
differing situations. Some of the topics included: everyday objects,
biographies, literature, and scientific descriptions. These text
excerpts were approximately a page long. Note that other text excerpts
were used for developmental testing.
The results are excellent. For three of the four languages the accuracy
rating was 100%, and for Spanish the accuracy rating was 98%. This shows
that the Language Recognition Program and its algorithms are very
effective at determining the language of a text document. Furthermore,
it handled all types of topics well.
It was also observed that the program was very fast in processing the
text excerpts. The Language Recognition Program was scanning at the rate
of 50 000 words per minute (on a computer with a 2.0 GHz processor).
This was determined by timing 10 scans of long text excerpts and taking
the average of these times. These text excerpts were at least 100 000
characters long. In summary, the Language Recognition Program was able
to determine the language of a text excerpt accurately and quickly.
|