Hungarian Dictionary

If you wish something new in Smart Keyboard, here is the place to ask!
Post Reply
kavkar
Posts: 1
Joined: Tue Sep 21, 2010 3:53 pm

Hungarian Dictionary

Post by kavkar »

Hello,
I read the other forums about dictionary requests, so I started find a HUN word list.
I found one in this page: http://mokk.bme.hu/resources/webcorpus
There you can find "the first 100 000 most frequent words in order of frequency" (file: "web2.2-freq-sorted.top100k.nofreqs.txt").
I would be grateful If You could make a Hungarian dictionary for Smart Keyboard.
Thanks,
Peter
BOTICSELLI
Posts: 1
Joined: Fri Oct 29, 2010 9:31 am
Phone: Samsung i9000, Froyo

Re: Hungarian Dictionary

Post by BOTICSELLI »

I would like to ask the same thing. Please give us a hungarian dictionary! Thank you very much, this keyboard is awesome. :-)
User avatar
endrus
Posts: 3
Joined: Mon Oct 25, 2010 8:21 pm
Phone: HTC Desire / CM7.1
Contact:

Re: Hungarian Dictionary

Post by endrus »

Hello Cyril,

I have cheked the word frequency file that was posted @ first post and that is useless, because that is full of rubbish (special characters and the language file is mixed-up).
I am shocked how much effort those guys put to proccess huge data without a simple language filter. Nonsense!

Anyway, I have found a very useful and free to use tool that quickly create a frequency list of input text from given documents such as MS Word, Open Office or html.

You can find it here and you can also publish it at your forum so others might benefit from it:

http://neon.niederlandistik.fu-berlin.de/textstat/

I used 19 documents of 1.665.754 words such as 11.513.966 bytes and generated a word frequency list of 46.000 words. I used a treshold of minimum 3 repetition to shorten the list and filter rare words. Without the limit of 3 repetition the list would have been over 150k which is waste of resources.

Please generate a Hungarian dictionary and send it back to me for testing purposes before publishing it to the market. The file usues UTF8 character encoding and saved without BOM using notepad++.

Cheers,

endrus
Attachments
Hungarian_freq_dict_UTF8_wo_BOM.zip
(189.83 KiB) Downloaded 254 times
Last edited by endrus on Wed Dec 08, 2010 11:28 pm, edited 1 time in total.
User avatar
cyril
Developer
Posts: 2079
Joined: Tue Feb 02, 2010 4:02 pm
Phone: Nexus One 2.3
Location: Nice, France

Re: Hungarian Dictionary

Post by cyril »

Ok you can try it here
I had to remove the duplicates when the same word is present in upper and lower case (I keep the lower case)
Cyril
User avatar
endrus
Posts: 3
Joined: Mon Oct 25, 2010 8:21 pm
Phone: HTC Desire / CM7.1
Contact:

Re: Hungarian Dictionary

Post by endrus »

cyril wrote:Ok you can try it here
I had to remove the duplicates when the same word is present in upper and lower case (I keep the lower case)
Cyril,

The dictionary seems to work fine and it is time to release it to the Market. :D
I keep an eye on the comments and future releases might follow. :idea:

Thanks for your fast action! ;)

-Endrus
User avatar
cyril
Developer
Posts: 2079
Joined: Tue Feb 02, 2010 4:02 pm
Phone: Nexus One 2.3
Location: Nice, France

Re: Hungarian Dictionary

Post by cyril »

OK just released it!
BTW I can think you can add more words as 500 kB is not too big
Cyril
Post Reply