You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by "Marco A.G.Pinto" <ma...@mail.telepac.pt> on 2014/09/01 16:06:26 UTC

.IDX file for Thesaurus

Hello!

If you all well remember, my Proofing Tool GUI (PTG) software allows 
also to edit the Thesaurus by editing .DAT files.

But I have peeked into a a few extensions and they also have an .IDX 
file for the Thesaurus.

Does anyone know how the .IDX is generated so that I code it into PTG?

Thanks!

Kind regards,
      >Marco A.G.Pinto
        ----------------------


-- 

Re: .IDX file for Thesaurus

Posted by Alexandro Colorado <jz...@oooes.org>.
I talked to marco on the IRC, this script is really easy to understand
because is all commented out. I dont see the problem on understanding
a parser, declaring a bunch of different varaibles and then do a
double while loop to parse the title and the content of the DAT file.
Finally the script just put together wathever was parsed.

You dont need to be a genius or know perl to understand that. However
further talk he really wanted is to integrate the script into his
project doing reverse engineering. So that is a completely different
task since we don't know how his software is worked out. Purebasic is
a proprietary application that I wouldnt really come close to want to
touch.

I have suggested Marco to 'port' his application to OpenOffice native
OOBasic environment. It would work out much better for the rest of the
community.

On 9/1/14, Andrea Pescetti <pe...@apache.org> wrote:
> Marco A.G.Pinto wrote:
>> "Please follow-up to the l10n list"?
>
> OK. No problem at all about you mailing multiple lists the first time.
> But it's useful to know which one should receive the further answers
> (otherwise we risk parallel discussions).
>
>> I was looking at the URL and it does have some small source code.
>> Unfortunately I have some problems understanding the code since I don't
>> master the language.
>
> Perl can be quite unreadable, this one is not even the most unreadable...
>
>> Could someone please translate it for me in human language (the
>> algorithm) so that I can code it in PureBasic?
>
> I would still need a Perl manual to find out what the single pieces do.
> Data are computed in the "read thesaurus line by line" cycle, but if
> Kevin wrote it (see first lines) he can probably give you some more
> guidance.
>
> Regards,
>    Andrea.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: l10n-help@openoffice.apache.org
>
>


-- 
Alexandro Colorado
Apache OpenOffice Contributor
882C 4389 3C27 E8DF 41B9  5C4C 1DB7 9D1C 7F4C 2614

---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org


Re: .IDX file for Thesaurus

Posted by Andrea Pescetti <pe...@apache.org>.
Marco A.G.Pinto wrote:
> "Please follow-up to the l10n list"?

OK. No problem at all about you mailing multiple lists the first time. 
But it's useful to know which one should receive the further answers 
(otherwise we risk parallel discussions).

> I was looking at the URL and it does have some small source code.
> Unfortunately I have some problems understanding the code since I don't
> master the language.

Perl can be quite unreadable, this one is not even the most unreadable...

> Could someone please translate it for me in human language (the
> algorithm) so that I can code it in PureBasic?

I would still need a Perl manual to find out what the single pieces do. 
Data are computed in the "read thesaurus line by line" cycle, but if 
Kevin wrote it (see first lines) he can probably give you some more 
guidance.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org


Re: .IDX file for Thesaurus

Posted by "Marco A.G.Pinto" <ma...@mail.telepac.pt>.
"Please follow-up to the l10n list"?

Dear Andrea and team,

I sent it to both lists because it is a coding question relating 
proofing, so I am still not sure which ML is the best.

I was looking at the URL and it does have some small source code.

Unfortunately I have some problems understanding the code since I don't 
master the language.

Could someone please translate it for me in human language (the 
algorithm) so that I can code it in PureBasic?

I know that I am annoying with so many questions all the time, but I 
want to build the ultimate proofing tool so that it can be used by 
everyone and that it serves all needs.

Thanks!

Kind regards from your brother and friend,
        >Marco A.G.Pinto
          ----------------------



On 01/09/2014 17:26, Andrea Pescetti wrote:
> Marco A.G.Pinto wrote:
>> But I have peeked into a a few extensions and they also have an .IDX
>> file for the Thesaurus.
>> Does anyone know how the .IDX is generated so that I code it into PTG?
>
> With th_gen_idx.pl ; you can find it in, e.g., (randomly found with a 
> search engine)
>
> http://wordnet-gaeilge.googlecode.com/svn/trunk/th_gen_idx.pl
>
> I'm not sure that repository has a copy of the license, but my copy 
> has the license you find under this message.
>
> A note: when writing to more than one list, it's useful to know which 
> one is the preferred one for follow-up (so we can avoid sending too 
> many messages). So for example, in the first line say "Please 
> follow-up to the l10n list" and we'll know we should post answers only 
> there.
>
> Regards,
>   Andrea.
>
> $ cat LICENSE_th_gen_idx.txt
> /*
>  * Copyright 2003 Kevin B. Hendricks, Stratford, Ontario, Canada
>  * And Contributors.  All rights reserved.
>  *
>  * Redistribution and use in source and binary forms, with or without
>  * modification, are permitted provided that the following conditions
>  * are met:
>  *
>  * 1. Redistributions of source code must retain the above copyright
>  *    notice, this list of conditions and the following disclaimer.
>  *
>  * 2. Redistributions in binary form must reproduce the above copyright
>  *    notice, this list of conditions and the following disclaimer in the
>  *    documentation and/or other materials provided with the 
> distribution.
>  *
>  * 3. All modifications to the source code must be clearly marked as
>  *    such.  Binary redistributions based on modified source code
>  *    must be clearly marked as modified versions in the documentation
>  *    and/or other materials provided with the distribution.
>  *
>  * THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
>  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
>  * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL
>  * KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
>  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
>  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
> STRICT
>  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
> ANY WAY
>  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>  * SUCH DAMAGE.
>  *
>  */
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>


-- 

Re: .IDX file for Thesaurus

Posted by "Marco A.G.Pinto" <ma...@mail.telepac.pt>.
"Please follow-up to the l10n list"?

Dear Andrea and team,

I sent it to both lists because it is a coding question relating 
proofing, so I am still not sure which ML is the best.

I was looking at the URL and it does have some small source code.

Unfortunately I have some problems understanding the code since I don't 
master the language.

Could someone please translate it for me in human language (the 
algorithm) so that I can code it in PureBasic?

I know that I am annoying with so many questions all the time, but I 
want to build the ultimate proofing tool so that it can be used by 
everyone and that it serves all needs.

Thanks!

Kind regards from your brother and friend,
        >Marco A.G.Pinto
          ----------------------



On 01/09/2014 17:26, Andrea Pescetti wrote:
> Marco A.G.Pinto wrote:
>> But I have peeked into a a few extensions and they also have an .IDX
>> file for the Thesaurus.
>> Does anyone know how the .IDX is generated so that I code it into PTG?
>
> With th_gen_idx.pl ; you can find it in, e.g., (randomly found with a 
> search engine)
>
> http://wordnet-gaeilge.googlecode.com/svn/trunk/th_gen_idx.pl
>
> I'm not sure that repository has a copy of the license, but my copy 
> has the license you find under this message.
>
> A note: when writing to more than one list, it's useful to know which 
> one is the preferred one for follow-up (so we can avoid sending too 
> many messages). So for example, in the first line say "Please 
> follow-up to the l10n list" and we'll know we should post answers only 
> there.
>
> Regards,
>   Andrea.
>
> $ cat LICENSE_th_gen_idx.txt
> /*
>  * Copyright 2003 Kevin B. Hendricks, Stratford, Ontario, Canada
>  * And Contributors.  All rights reserved.
>  *
>  * Redistribution and use in source and binary forms, with or without
>  * modification, are permitted provided that the following conditions
>  * are met:
>  *
>  * 1. Redistributions of source code must retain the above copyright
>  *    notice, this list of conditions and the following disclaimer.
>  *
>  * 2. Redistributions in binary form must reproduce the above copyright
>  *    notice, this list of conditions and the following disclaimer in the
>  *    documentation and/or other materials provided with the 
> distribution.
>  *
>  * 3. All modifications to the source code must be clearly marked as
>  *    such.  Binary redistributions based on modified source code
>  *    must be clearly marked as modified versions in the documentation
>  *    and/or other materials provided with the distribution.
>  *
>  * THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
>  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
>  * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL
>  * KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
>  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
>  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
> STRICT
>  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
> ANY WAY
>  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>  * SUCH DAMAGE.
>  *
>  */
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>


-- 

Re: .IDX file for Thesaurus

Posted by Andrea Pescetti <pe...@apache.org>.
Marco A.G.Pinto wrote:
> But I have peeked into a a few extensions and they also have an .IDX
> file for the Thesaurus.
> Does anyone know how the .IDX is generated so that I code it into PTG?

With th_gen_idx.pl ; you can find it in, e.g., (randomly found with a 
search engine)

http://wordnet-gaeilge.googlecode.com/svn/trunk/th_gen_idx.pl

I'm not sure that repository has a copy of the license, but my copy has 
the license you find under this message.

A note: when writing to more than one list, it's useful to know which 
one is the preferred one for follow-up (so we can avoid sending too many 
messages). So for example, in the first line say "Please follow-up to 
the l10n list" and we'll know we should post answers only there.

Regards,
   Andrea.

$ cat LICENSE_th_gen_idx.txt
/*
  * Copyright 2003 Kevin B. Hendricks, Stratford, Ontario, Canada
  * And Contributors.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  *
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * 3. All modifications to the source code must be clearly marked as
  *    such.  Binary redistributions based on modified source code
  *    must be clearly marked as modified versions in the documentation
  *    and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
  * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL
  * KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  */

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: .IDX file for Thesaurus

Posted by Yakov Reztsov <ya...@mail.ru>.
 You can generated .IDX file from .dat file with th_gen_idx.pl script from hunspell  source package.
You need only Perl for  it.



Понедельник,  1 сентября 2014, 15:06 +01:00 от "Marco A.G.Pinto":
>Hello!
>
>If you all well remember, my Proofing Tool GUI (PTG) software allows
    also to edit the Thesaurus by editing .DAT files.
>
>But I have peeked into a a few extensions and they also have an .IDX
    file for the Thesaurus.
>
>Does anyone know how the .IDX is generated so that I code it into
    PTG?
>
>Thanks!
>
>Kind regards,
>     >Marco A.G.Pinto
>       ----------------------
>
>
>-- 


-- 

Yakov Reztsov

Re: .IDX file for Thesaurus

Posted by Andrea Pescetti <pe...@apache.org>.
Marco A.G.Pinto wrote:
> But I have peeked into a a few extensions and they also have an .IDX
> file for the Thesaurus.
> Does anyone know how the .IDX is generated so that I code it into PTG?

With th_gen_idx.pl ; you can find it in, e.g., (randomly found with a 
search engine)

http://wordnet-gaeilge.googlecode.com/svn/trunk/th_gen_idx.pl

I'm not sure that repository has a copy of the license, but my copy has 
the license you find under this message.

A note: when writing to more than one list, it's useful to know which 
one is the preferred one for follow-up (so we can avoid sending too many 
messages). So for example, in the first line say "Please follow-up to 
the l10n list" and we'll know we should post answers only there.

Regards,
   Andrea.

$ cat LICENSE_th_gen_idx.txt
/*
  * Copyright 2003 Kevin B. Hendricks, Stratford, Ontario, Canada
  * And Contributors.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  *
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * 3. All modifications to the source code must be clearly marked as
  *    such.  Binary redistributions based on modified source code
  *    must be clearly marked as modified versions in the documentation
  *    and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
  * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL
  * KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  */

---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org