Aspell and Hunspell are the spellcheckers.
In that Hunspell have only 13000 words in dictionary for spell checker, in that we try to add 100000 words for Aspell and Hunspell Tamil dictionary.
The Tamil words are get from Lexicon Tamil Dictionary.
This working for aspell:
Get the svn for aspell:
links:http://packages.debian.org/source/sid/aspell-ta
The following the command to check out the source of aspell through svn:
svn://svn.debian.org/debian-in/aspell-ta/trunk
create the new folder:
$ mkdir aspell
$ cd aspell
$ svn co svn://svn.debian.org/debian-in/aspell-ta/trunk
$ cd trunk
$ ls
configure Copyright doc Makefile.pre ta.cwl tamil.alias u-taml.cmap
COPYING debian info README ta.dat ta.multi u-taml.cset
$ preunzip ta.cwl
$ ls
configure Copyright doc Makefile.pre ta.dat ta.multi u-taml.cmap
COPYING debian info README tamil.alias ta.wl u-taml.cset
$ gedit ta.wl
Add the Tamil word and save it.
$ prezip ta.wl
$./configure
$make
$ls //The rws file is created now
configure debian Makefile ta.cwl tamil.alias ta.wl~
COPYING doc Makefile.pre ta.cwl.bk ta.multi u-taml.cmap
Copyright info README ta.dat ta.rws u-taml.cset
Here we are going to add the rws file to local aspell-ta dictionary and check with our system.
Install the aspell on fedora:
# yumdownloader –source aspell
# yumdownloader –source Aspell-ta
In default the aspell is located /usr/lib/aspell-0.60/ in this folder contain the ta.rws(This is original word contain in aspell)before going to move the new(ta.rws)file here,backup the already ta.rws file
#cp ta.rws ta.rws.bk
and move the ta.rws file here.Now all the words are added into the aspell dictionary.
To Check:
# aspell -d ta.rws -a
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6)
மாலதி //This word is in the list
*
சுஜி //சுஜி this word not in the list so it give other options
& சுஜி 10 0: சோஜி, சுகி, சுசி, சுதி, சுனி, சுரி, சுளி, சுழி, சுவி, சப்ஜி
அஃகம் //This word is in the list
*
NOTE:
The individual word lists have an extension of “.cwl” and are compressed to save space. To uncompress a word list use “preunzip BASE.cwl” which will uncompress it and rename the file to “BASE.wl”. To dump a compressed word list to standard output use “precat BASE.cwl”. To uncompress all word lists in the current directory use “preunzip *.cwl”. For more help on “preunzip” use “preunzip –help”.
Hunspell:
Download the wordxtr:
link:https://fedorahosted.org/wordxtr/ //in this links get the tar file (or)
#yum install wordxtr
#wordxtr ta_IN TAMIL //ref:Note1
Creating dictionary for language “ta_IN” using text data in directory “TAMIL”
00%….Creating Text Data to Parse
25%….Reading Text Data to Parse
50%….Created Text Data to Parse
65%….Extracted words from input Text Data
80%….Removed duplicated words from extracted wordlist
Basic ta_IN.dic and ta_IN.aff created
……in current directory
NOTE 1:
TAMIL is the plain folder contain only the text file of the Tamil words.ta_IN is language code forTamil.
The ta_IN.dic and ta_IN.aff files are created in current directory,it move to the Hunspell location /usr/share/myspell/.now all the words are added in to the Hunspell dictionary.
Great initiative.
We are in the need of tamil spell checker. Aspell in tamil will be very useful for the community.
Thanks.
Test…
Keep going. DOing a great job.
[…] https://saranyaselvaraj.wordpress.com/2009/09/17/aspell-and-hunspell/ […]