Spellcorrector 0.2

24 September 2007   3 comments   Python

Mind that age!

This blog post is 13 years old! Most likely, its content is outdated. Especially if it's technical.

Unlike previous incarnations of Spellcorrector not it does not by default load the two huge language files for English and Swedish. Alternatively/additionally you can load your own language file. The difference between loading a language file and training on your own words is that trained words are always assumed to be correct.

Another major change with this release is that a pickle file is created once the language file or own training file has been parsed once. This works like a cache, if the original text file changes, the pickle file is recreated. The outcome of this is that the first time you create a Spellcorrector instance it takes a few seconds if the language files is large but on the second time it takes virtually no time at all.

So, recap, here are the different methods for loading the 'Spellcorrector':

>>> Spellcorrector('en')

>>> assert os.path.isdir('languagefiles')
>>> Spellcorrector('en', load_language_files=True)

>>> Spellcorrector('en', load_language_file='/home/peterbe/text.txt')

>>> Spellcorrector('en', own_training_file='/home/peterbe/names.txt')

The load_language_file expects a readable file full of text. The text doesn't have to be written as one word per line. All junk like punctuation and brackets and stuff is stripped.

The own_training_file has to be a file with one word per line. You can combine the two like this:

>>> Spellcorrector('en', load_language_file='/home/peterbe/text.txt',

There's also been a few other fixes and improvements. For example, there's now two basic unittests at the bottom of the file that might give some clues how it can work for you.

Download spellcorrector.py 0.2 I really ought to include this in PyPi. Something for my todo list.



I am interesting by your personal's version of Peter Novig's corrector. I have tried it for my language of South of france (Occitan). I did a test with txt's file. It works good but in my language there are many letters like ò ó ì í ù ú à á è é ç .The correction's method or the suggestions's method, when there is a vowel stressed in the word, cut the word.I am not a very good pythoner and I don't know how resolve this little problem. Can you give me some hints ?
Compliments for your corrector,

Peter Bengtsson

It supports Unicode. But you'll have to modify it and write down the alphabet of your language.
Oh, and make sure you write the .txt file in UTF8!


Thanks for your answer, Peter. In the evening I looked after some informations for unicode etc... on Python and I think that the format's file is not UTF8 !
Thanks a lot,

Your email will never ever be published

Related posts

Ugliest site of the month - The Backyard Comedy Club 21 September 2007
Linux tip: du --max-depth=1 27 September 2007
Related by Keyword:
How to use django-cache-memoize 03 November 2017
django-cache-memoize 27 October 2017
cache_memoize - a pretty decent cache decorator for Django 11 September 2017
Fastest Redis configuration for Django 11 May 2017
Welcome to the world django-fancy-cache! 01 March 2013