You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by "Marco A.G.Pinto" <ma...@mail.telepac.pt> on 2014/12/04 16:35:28 UTC

Hunspell unmunching question

Hello!

Around a week ago, Peter from England sent me an e-mail suggesting new 
words to be added to en_GB.

One of them was "unsubscribe".

Here is what appears in Proofing Tool GUI:


The strange thing is that I tried the variants in Mozilla and OpenOffice 
and none of them was marked as a typo.

I started meditating about it and wondered if in Hunspell the prefixes 
would attach themselves to all suffixes.

Today I made a test, please see the archive: 
https://dl.dropboxusercontent.com/u/30674540/hunspell_issue_marcoagpinto_20141204.zip
It contains the extracted wordlists both in PTG and Unmunch and also the 
.DIC + .AFF I created for the tests.

In my PTG 3.0 build 67 I get:
*subscribe**
**resubscribe**
**subscribing**
**oversubscribe**
**subscribes**
**subscribed**
**unsubscribe**
**000**
**subscribe**
**unsubscribe**
**resubscribe**
**subscribing**
**oversubscribe**
**subscribes**
**subscribed**
**
*In Unmunch for Linux I got:
*subscribe**
**subscribing**
**subscribed**
**subscribes**
**resubscribing**
**oversubscribing**
**unsubscribing**
**resubscribed**
**oversubscribed**
**unsubscribed**
**resubscribes**
**oversubscribes**
**unsubscribes**
**resubscribe**
**oversubscribe**
**unsubscribe**
**000**
**subscribe**
**subscribing**
**subscribed**
**subscribes**
**resubscribing**
**oversubscribing**
**unsubscribing**
**resubscribed**
**oversubscribed**
**unsubscribed**
**resubscribes**
**oversubscribes**
**unsubscribes**
**resubscribe**
**oversubscribe**
**unsubscribe**
**
*I placed a "000" to divide the same word with an exchanged order of the 
code "U" to make sure it would produce the same results, no matter its 
position.

What this means is that I probably need to change the code of my tool, 
maybe create three arrays:
1st - to store the words with suffixes
2nd - to store the codes of the prefixes
3rd - to store 1st plus all its combinations with the prefixes (it would 
apply prefixes to 1st and store them in 3rd )

Then, I would display the prefixes at the bottom in PTG not following 
the order of the codes?

What this also means is that there are hundreds of combinations not 
appearing in the wordlist which I always publish in .txt in the GitHub 
of the project but that are processed by Hunspell in Mozilla (Firefox, 
Thunderbird and SeaMonkey) and Apache OpenOffice.

Thanks for your time!

Kind regards,
       >Marco A.G.Pinto
         ----------------------


-- 

Re: Hunspell unmunching question

Posted by "Marco A.G.Pinto" <ma...@mail.telepac.pt>.
Andrea, I have already fixed it on build 68.

It now displays combined prefixes at the bottom of the derivates panel.

I have compiled a build 70 to fix some little bugs, which will be 
available when I update my site on the 1st of January.

I have also written a guide about how to install the dictionaries, also 
available on the next site update (I will add a link from the English 
Dictionaries to there).

I have also been working on en_GB and it should bring around 700 new 
words on 1-JAN-2015 (I usually add around 600 to 800 words on each 
update) making a total of 8000+ new words since I grabbed the project a 
year ago.

My dear friend,
    Kind regards,
       >Marco A.G.Pinto
         ----------------------


On 24/12/2014 19:56, Andrea Pescetti wrote:
> On 04/12/2014 Marco A.G.Pinto wrote:
>> What this means is that I probably need to change the code of my tool,
>> maybe create three arrays:
>> 1st - to store the words with suffixes
>> 2nd - to store the codes of the prefixes
>> 3rd - to store 1st plus all its combinations with the prefixes (it would
>> apply prefixes to 1st and store them in 3rd )
>
> Displaying all combinations would be highly unpractical since indeed 
> it would explode. Maybe you could rearrange the GUI so that it 
> displays something like "unsubscribe (and derivatives)" when it 
> handles prefixes.
>
> Regards,
>   Andrea.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>


-- 

Re: Hunspell unmunching question

Posted by "Marco A.G.Pinto" <ma...@mail.telepac.pt>.
Andrea, I have already fixed it on build 68.

It now displays combined prefixes at the bottom of the derivates panel.

I have compiled a build 70 to fix some little bugs, which will be 
available when I update my site on the 1st of January.

I have also written a guide about how to install the dictionaries, also 
available on the next site update (I will add a link from the English 
Dictionaries to there).

I have also been working on en_GB and it should bring around 700 new 
words on 1-JAN-2015 (I usually add around 600 to 800 words on each 
update) making a total of 8000+ new words since I grabbed the project a 
year ago.

My dear friend,
    Kind regards,
       >Marco A.G.Pinto
         ----------------------


On 24/12/2014 19:56, Andrea Pescetti wrote:
> On 04/12/2014 Marco A.G.Pinto wrote:
>> What this means is that I probably need to change the code of my tool,
>> maybe create three arrays:
>> 1st - to store the words with suffixes
>> 2nd - to store the codes of the prefixes
>> 3rd - to store 1st plus all its combinations with the prefixes (it would
>> apply prefixes to 1st and store them in 3rd )
>
> Displaying all combinations would be highly unpractical since indeed 
> it would explode. Maybe you could rearrange the GUI so that it 
> displays something like "unsubscribe (and derivatives)" when it 
> handles prefixes.
>
> Regards,
>   Andrea.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>


-- 

Re: Hunspell unmunching question

Posted by Andrea Pescetti <pe...@apache.org>.
On 04/12/2014 Marco A.G.Pinto wrote:
> What this means is that I probably need to change the code of my tool,
> maybe create three arrays:
> 1st - to store the words with suffixes
> 2nd - to store the codes of the prefixes
> 3rd - to store 1st plus all its combinations with the prefixes (it would
> apply prefixes to 1st and store them in 3rd )

Displaying all combinations would be highly unpractical since indeed it 
would explode. Maybe you could rearrange the GUI so that it displays 
something like "unsubscribe (and derivatives)" when it handles prefixes.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org


Re: Hunspell unmunching question

Posted by Andrea Pescetti <pe...@apache.org>.
On 04/12/2014 Marco A.G.Pinto wrote:
> What this means is that I probably need to change the code of my tool,
> maybe create three arrays:
> 1st - to store the words with suffixes
> 2nd - to store the codes of the prefixes
> 3rd - to store 1st plus all its combinations with the prefixes (it would
> apply prefixes to 1st and store them in 3rd )

Displaying all combinations would be highly unpractical since indeed it 
would explode. Maybe you could rearrange the GUI so that it displays 
something like "unsubscribe (and derivatives)" when it handles prefixes.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org