You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pierrick Brihaye <pi...@wanadoo.fr> on 2003/09/28 09:49:14 UTC

Announce : arabic Stemmer/Analyzer for Lucene

Hi all,

I have written a Lucene Analyzer for arabic. You will find it here :
http://perso.wanadoo.fr/pierrick.brihaye/ArabicAnalyzer.jar (provisional
adress, anybody interested in hosting it ?)

This work is still in beta stage but it gives quite good results :-)

In order to make it work, you need :

1) a 1.4+ JVM (because of the native support for regular expressions which
are heavily used in the program ; I've been too lazy to use an external
package)

2) Apache Jakarta Commons-Collections :
http://jakarta.apache.org/commons/collections.html

3) a recent Lucene distribution ;-)

All this work is based on the amazing Tim Buckwalter's Arabic Morphological
Analyzer Version 1.0
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49)
originaly written in Perl and released under the GPL.

The jar contains :

a) the compiled classes
b) the required data files (dictionaries and compatibility tables)
c) 2 command-line test programs
d) 3 test documents with different encodings
e) the source code
f) a README file that will give you a little bit more of information :-)

To Lucene developers : I plan to offer this work to Lucene (see the jar
hierarchy... and the source file headers ;-). Any objections ?

Feedback is very welcome : there are quite a lot of unresolved issues, with
the analyzer itselfs as well as with Lucene.

mE AlslAmap, cheers,

p.b.






---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Eric Jain <Er...@isb-sib.ch>.
> I have personal worries about including GPL code in any commercial
> application (even dynamically linked).

See http://www.gnu.org/licenses/gpl-faq.html#IfInterpreterIsGPL:

"A consequence is that if you choose to use GPL'd Perl modules or Java
classes in your program, you must release the program in a
GPL-compatible way, regardless of the license used in the Perl or Java
interpreter that the combined Perl or Java program will run on."

MySQL AB, for instance, insists that you buy a license for their JDBC
driver if you intend to use it in a non-GPL application...

--
Eric Jain


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Pierrick Brihaye <pi...@wanadoo.fr>.
Hi,

> Is it possible to contact Tim,

I did it soon after I posted the announcment.

> and ask if he will allow you to license
> his code under an Apache style license?  Many authors are accomodating
> with licensing software under different licenses.

It's true but...

> I have personal worries about including GPL code in any commercial
> application (even dynamically linked).

... so do I :-)

Thanks for the advices (more to come on Monday I presume). I think it will
help to take my decision.

Cheers,

p.b.



Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Eric Jain <Er...@isb-sib.ch>.
> I have personal worries about including GPL code in any commercial
> application (even dynamically linked).

See http://www.gnu.org/licenses/gpl-faq.html#IfInterpreterIsGPL:

"A consequence is that if you choose to use GPL'd Perl modules or Java
classes in your program, you must release the program in a
GPL-compatible way, regardless of the license used in the Perl or Java
interpreter that the combined Perl or Java program will run on."

MySQL AB, for instance, insists that you buy a license for their JDBC
driver if you intend to use it in a non-GPL application...

--
Eric Jain


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Pierrick Brihaye <pi...@wanadoo.fr>.
Hi,

> Is it possible to contact Tim,

I did it soon after I posted the announcment.

> and ask if he will allow you to license
> his code under an Apache style license?  Many authors are accomodating
> with licensing software under different licenses.

It's true but...

> I have personal worries about including GPL code in any commercial
> application (even dynamically linked).

... so do I :-)

Thanks for the advices (more to come on Monday I presume). I think it will
help to take my decision.

Cheers,

p.b.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Scott Farquhar <sc...@atlassian.com>.
On Sun, Sep 28, 2003 at 09:49:14AM +0200, Pierrick Brihaye wrote:
> All this work is based on the amazing Tim Buckwalter's Arabic Morphological
> Analyzer Version 1.0
> (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49)
> originaly written in Perl and released under the GPL.

Is it possible to contact Tim, and ask if he will allow you to license
his code under an Apache style license?  Many authors are accomodating
with licensing software under different licenses.

I have personal worries about including GPL code in any commercial
application (even dynamically linked).

Cheers,
Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Scott Farquhar <sc...@atlassian.com>.
On Sun, Sep 28, 2003 at 09:49:14AM +0200, Pierrick Brihaye wrote:
> All this work is based on the amazing Tim Buckwalter's Arabic Morphological
> Analyzer Version 1.0
> (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49)
> originaly written in Perl and released under the GPL.

Is it possible to contact Tim, and ask if he will allow you to license
his code under an Apache style license?  Many authors are accomodating
with licensing software under different licenses.

I have personal worries about including GPL code in any commercial
application (even dynamically linked).

Cheers,
Scott

Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by David Medinets <me...@mtolive.com>.
----- Original Message -----
From: "Erik Hatcher" <er...@ehatchersolutions.com>
> On Sunday, September 28, 2003, at 10:08  AM, Pierrick Brihaye wrote:
> > So... it is an ASL infringement from my part to have prepended
> > "org.apache.lucene" to the aramorph package :-) ?
>
> Well, there are no rules or legalese about restricting package naming
> conventions anywhere that I know of (in or out of Apache) except for
> the java.* and sun.* that is built-into Java itself that can seal these
> out from use.  So there is probably no problem using that package name
> on GPL'd code.

There may be no legal objection, but is it considered a good idea to
appropriate the namespace? If you don't own a domain and your name isn't
common, then I'd suggest using a package such as
org.pierrick.brihaye.aramorph. Alternatively you could be more creative. As
far as I know, there are no restrictions on package names except those
imposed on directory names. So you don't need to start with 'org' or 'com'.
How about coder.pierrick.brihaye.aramorph?

David Medinets
http://www.codebits.com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by David Medinets <me...@mtolive.com>.
----- Original Message -----
From: "Erik Hatcher" <er...@ehatchersolutions.com>
> On Sunday, September 28, 2003, at 10:08  AM, Pierrick Brihaye wrote:
> > So... it is an ASL infringement from my part to have prepended
> > "org.apache.lucene" to the aramorph package :-) ?
>
> Well, there are no rules or legalese about restricting package naming
> conventions anywhere that I know of (in or out of Apache) except for
> the java.* and sun.* that is built-into Java itself that can seal these
> out from use.  So there is probably no problem using that package name
> on GPL'd code.

There may be no legal objection, but is it considered a good idea to
appropriate the namespace? If you don't own a domain and your name isn't
common, then I'd suggest using a package such as
org.pierrick.brihaye.aramorph. Alternatively you could be more creative. As
far as I know, there are no restrictions on package names except those
imposed on directory names. So you don't need to start with 'org' or 'com'.
How about coder.pierrick.brihaye.aramorph?

David Medinets
http://www.codebits.com


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sunday, September 28, 2003, at 10:08  AM, Pierrick Brihaye wrote:
>>  It probably wouldn't be a bad idea to have
>> some type of repository of Lucene extensions hosted elsewhere anyway 
>> to
>> solve the GPL issue.
>
> Totally agree !
>
> So... it is an ASL infringement from my part to have prepended
> "org.apache.lucene" to the aramorph package :-) ?

Well, there are no rules or legalese about restricting package naming 
conventions anywhere that I know of (in or out of Apache) except for 
the java.* and sun.* that is built-into Java itself that can seal these 
out from use.  So there is probably no problem using that package name 
on GPL'd code.

	Erik


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sunday, September 28, 2003, at 10:08  AM, Pierrick Brihaye wrote:
>>  It probably wouldn't be a bad idea to have
>> some type of repository of Lucene extensions hosted elsewhere anyway 
>> to
>> solve the GPL issue.
>
> Totally agree !
>
> So... it is an ASL infringement from my part to have prepended
> "org.apache.lucene" to the aramorph package :-) ?

Well, there are no rules or legalese about restricting package naming 
conventions anywhere that I know of (in or out of Apache) except for 
the java.* and sun.* that is built-into Java itself that can seal these 
out from use.  So there is probably no problem using that package name 
on GPL'd code.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Pierrick Brihaye <pi...@wanadoo.fr>.
Hi,

> Since you ported GPL code, aren't you required to GPL your version as
> well?

I think so : the aramorph package should be GPLed IMHO. The rest, i.e. my
own code, can be given to Lucene :-)

> The safest bet is to just host what you've done on your own or at
> Sourceforge or java.net.

Or elsewhere :-) The main objective is to use this analyzer within SDX :
http://www.nongnu.org/sdx/

>  It probably wouldn't be a bad idea to have
> some type of repository of Lucene extensions hosted elsewhere anyway to
> solve the GPL issue.

Totally agree !

So... it is an ASL infringement from my part to have prepended
"org.apache.lucene" to the aramorph package :-) ?

Cheers,

p.b.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Pierrick Brihaye <pi...@wanadoo.fr>.
Hi,

> Since you ported GPL code, aren't you required to GPL your version as
> well?

I think so : the aramorph package should be GPLed IMHO. The rest, i.e. my
own code, can be given to Lucene :-)

> The safest bet is to just host what you've done on your own or at
> Sourceforge or java.net.

Or elsewhere :-) The main objective is to use this analyzer within SDX :
http://www.nongnu.org/sdx/

>  It probably wouldn't be a bad idea to have
> some type of repository of Lucene extensions hosted elsewhere anyway to
> solve the GPL issue.

Totally agree !

So... it is an ASL infringement from my part to have prepended
"org.apache.lucene" to the aramorph package :-) ?

Cheers,

p.b.



Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sunday, September 28, 2003, at 09:20  AM, Pierrick Brihaye wrote:
>> We can have no GPL code in Apache's CVS.
>
> :-/ How can we do, so ? Shall I split the packages in two parts ? No
> problems for the "Lucene bindings". But there could be one for the 
> aramorph
> package (java port of the original work), which is based on work 
> originally
> ruled by the GPL...

Since you ported GPL code, aren't you required to GPL your version as 
well?  I'm not a lawyer and not clear on how this works.  You can use 
reflection to "link" to GPL code from Apache code, but I don't think 
you can directly import GPL'd packages (again, I'm not clear on this 
though).

The safest bet is to just host what you've done on your own or at 
Sourceforge or java.net.  It probably wouldn't be a bad idea to have 
some type of repository of Lucene extensions hosted elsewhere anyway to 
solve the GPL issue.

	Erik


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sunday, September 28, 2003, at 09:20  AM, Pierrick Brihaye wrote:
>> We can have no GPL code in Apache's CVS.
>
> :-/ How can we do, so ? Shall I split the packages in two parts ? No
> problems for the "Lucene bindings". But there could be one for the 
> aramorph
> package (java port of the original work), which is based on work 
> originally
> ruled by the GPL...

Since you ported GPL code, aren't you required to GPL your version as 
well?  I'm not a lawyer and not clear on how this works.  You can use 
reflection to "link" to GPL code from Apache code, but I don't think 
you can directly import GPL'd packages (again, I'm not clear on this 
though).

The safest bet is to just host what you've done on your own or at 
Sourceforge or java.net.  It probably wouldn't be a bad idea to have 
some type of repository of Lucene extensions hosted elsewhere anyway to 
solve the GPL issue.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Pierrick Brihaye <pi...@wanadoo.fr>.
Hi,

> We could put this in the Lucene sandbox CVS perhaps.

Why not ?

> Could you package
> it similarly to the other contributions there with a build file

Yes... but you'll have to wait :-)

> and
> convert your command-line tests to JUnit tests that run from the build
> file?

And also on this point. The 2 CLI programs are rather "demonstration"
programs than real test cases that could demonstrate the current pending
issues.

> I took a quick look and looks like you did a fair bit of work and have
> the ASL in the source files.

Yes... at least in the source files that are based on my own work.

> The question, though, is whether your
> basing it on GPL code is acceptable.  Did you copy code from it?

As I said, it is based on Tim's Buckwalter work : original Perl program as
well as those precious dictionary files.

> We can have no GPL code in Apache's CVS.

:-/ How can we do, so ? Shall I split the packages in two parts ? No
problems for the "Lucene bindings". But there could be one for the aramorph
package (java port of the original work), which is based on work originally
ruled by the GPL...

Cheers,

p.b.




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Pierrick Brihaye <pi...@wanadoo.fr>.
Hi,

> We could put this in the Lucene sandbox CVS perhaps.

Why not ?

> Could you package
> it similarly to the other contributions there with a build file

Yes... but you'll have to wait :-)

> and
> convert your command-line tests to JUnit tests that run from the build
> file?

And also on this point. The 2 CLI programs are rather "demonstration"
programs than real test cases that could demonstrate the current pending
issues.

> I took a quick look and looks like you did a fair bit of work and have
> the ASL in the source files.

Yes... at least in the source files that are based on my own work.

> The question, though, is whether your
> basing it on GPL code is acceptable.  Did you copy code from it?

As I said, it is based on Tim's Buckwalter work : original Perl program as
well as those precious dictionary files.

> We can have no GPL code in Apache's CVS.

:-/ How can we do, so ? Shall I split the packages in two parts ? No
problems for the "Lucene bindings". But there could be one for the aramorph
package (java port of the original work), which is based on work originally
ruled by the GPL...

Cheers,

p.b.




Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
We could put this in the Lucene sandbox CVS perhaps.  Could you package  
it similarly to the other contributions there with a build file and  
convert your command-line tests to JUnit tests that run from the build  
file?

I took a quick look and looks like you did a fair bit of work and have  
the ASL in the source files.  The question, though, is whether your  
basing it on GPL code is acceptable.  Did you copy code from it?  We  
can have no GPL code in Apache's CVS.

	Erik


On Sunday, September 28, 2003, at 03:49  AM, Pierrick Brihaye wrote:

> Hi all,
>
> I have written a Lucene Analyzer for arabic. You will find it here :
> http://perso.wanadoo.fr/pierrick.brihaye/ArabicAnalyzer.jar  
> (provisional
> adress, anybody interested in hosting it ?)
>
> This work is still in beta stage but it gives quite good results :-)
>
> In order to make it work, you need :
>
> 1) a 1.4+ JVM (because of the native support for regular expressions  
> which
> are heavily used in the program ; I've been too lazy to use an external
> package)
>
> 2) Apache Jakarta Commons-Collections :
> http://jakarta.apache.org/commons/collections.html
>
> 3) a recent Lucene distribution ;-)
>
> All this work is based on the amazing Tim Buckwalter's Arabic  
> Morphological
> Analyzer Version 1.0
> (http://www.ldc.upenn.edu/Catalog/ 
> CatalogEntry.jsp?catalogId=LDC2002L49)
> originaly written in Perl and released under the GPL.
>
> The jar contains :
>
> a) the compiled classes
> b) the required data files (dictionaries and compatibility tables)
> c) 2 command-line test programs
> d) 3 test documents with different encodings
> e) the source code
> f) a README file that will give you a little bit more of information  
> :-)
>
> To Lucene developers : I plan to offer this work to Lucene (see the jar
> hierarchy... and the source file headers ;-). Any objections ?
>
> Feedback is very welcome : there are quite a lot of unresolved issues,  
> with
> the analyzer itselfs as well as with Lucene.
>
> mE AlslAmap, cheers,
>
> p.b.
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
We could put this in the Lucene sandbox CVS perhaps.  Could you package  
it similarly to the other contributions there with a build file and  
convert your command-line tests to JUnit tests that run from the build  
file?

I took a quick look and looks like you did a fair bit of work and have  
the ASL in the source files.  The question, though, is whether your  
basing it on GPL code is acceptable.  Did you copy code from it?  We  
can have no GPL code in Apache's CVS.

	Erik


On Sunday, September 28, 2003, at 03:49  AM, Pierrick Brihaye wrote:

> Hi all,
>
> I have written a Lucene Analyzer for arabic. You will find it here :
> http://perso.wanadoo.fr/pierrick.brihaye/ArabicAnalyzer.jar  
> (provisional
> adress, anybody interested in hosting it ?)
>
> This work is still in beta stage but it gives quite good results :-)
>
> In order to make it work, you need :
>
> 1) a 1.4+ JVM (because of the native support for regular expressions  
> which
> are heavily used in the program ; I've been too lazy to use an external
> package)
>
> 2) Apache Jakarta Commons-Collections :
> http://jakarta.apache.org/commons/collections.html
>
> 3) a recent Lucene distribution ;-)
>
> All this work is based on the amazing Tim Buckwalter's Arabic  
> Morphological
> Analyzer Version 1.0
> (http://www.ldc.upenn.edu/Catalog/ 
> CatalogEntry.jsp?catalogId=LDC2002L49)
> originaly written in Perl and released under the GPL.
>
> The jar contains :
>
> a) the compiled classes
> b) the required data files (dictionaries and compatibility tables)
> c) 2 command-line test programs
> d) 3 test documents with different encodings
> e) the source code
> f) a README file that will give you a little bit more of information  
> :-)
>
> To Lucene developers : I plan to offer this work to Lucene (see the jar
> hierarchy... and the source file headers ;-). Any objections ?
>
> Feedback is very welcome : there are quite a lot of unresolved issues,  
> with
> the analyzer itselfs as well as with Lucene.
>
> mE AlslAmap, cheers,
>
> p.b.
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Announce : arabic Stemmer/Analyzer for Lucene

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello and thank you.
I added this to out 'patch queue' at:
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23784

Otis

--- Pierrick Brihaye <pi...@wanadoo.fr> wrote:
> Hi all,
> 
> I have written a Lucene Analyzer for arabic. You will find it here :
> http://perso.wanadoo.fr/pierrick.brihaye/ArabicAnalyzer.jar
> (provisional
> adress, anybody interested in hosting it ?)
> 
> This work is still in beta stage but it gives quite good results :-)
> 
> In order to make it work, you need :
> 
> 1) a 1.4+ JVM (because of the native support for regular expressions
> which
> are heavily used in the program ; I've been too lazy to use an
> external
> package)
> 
> 2) Apache Jakarta Commons-Collections :
> http://jakarta.apache.org/commons/collections.html
> 
> 3) a recent Lucene distribution ;-)
> 
> All this work is based on the amazing Tim Buckwalter's Arabic
> Morphological
> Analyzer Version 1.0
>
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49)
> originaly written in Perl and released under the GPL.
> 
> The jar contains :
> 
> a) the compiled classes
> b) the required data files (dictionaries and compatibility tables)
> c) 2 command-line test programs
> d) 3 test documents with different encodings
> e) the source code
> f) a README file that will give you a little bit more of information
> :-)
> 
> To Lucene developers : I plan to offer this work to Lucene (see the
> jar
> hierarchy... and the source file headers ;-). Any objections ?
> 
> Feedback is very welcome : there are quite a lot of unresolved
> issues, with
> the analyzer itselfs as well as with Lucene.
> 
> mE AlslAmap, cheers,
> 
> p.b.
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org