You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Ryan McKinley <ry...@gmail.com> on 2009/06/21 07:10:30 UTC

3MB lucene-analyzers.jar?

With the added analyzer for LUCENE-1629, it seems the jar file is now ~3.5MB.

Given the size, does it make sense to put it in its own jar file?
That way programs can easily exclude it if space is a concern.

thanks
ryan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: 3MB lucene-analyzers.jar?

Posted by Ryan McKinley <ry...@gmail.com>.
On Jun 21, 2009, at 12:52 PM, Uwe Schindler wrote:

> One addition: The problem with packaging 2 jars of one contrib is  
> also, that
> all JAR files of Lucene are also generated as Maven artifacts. To  
> generate
> the correct Metadata and so on, there is more to be done than just  
> changing
> the build.xml.
>
> If we want this separate, the simpliest is to put in an own contrib  
> module,
> e.g. contrib-analyzers-cn
>

I think this is the best solution -- should be easy to implement  
also.  This way it could be easily included or not.

This is only really an issue in environments where we need to worry  
about install footprint -- I was just surprised to see my installer  
grow by 3MB -- since i'm not using the chinese analyzers, that should  
be avoidable.  Of course I could edit the jar myself, but it would be  
nice to make the artifacts maven friendly.

ryan



> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>> Sent: Sunday, June 21, 2009 6:37 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: 3MB lucene-analyzers.jar?
>>
>> On Sun, Jun 21, 2009 at 6:24 PM, Earwin Burrfoot<ea...@gmail.com>  
>> wrote:
>>>>> But: I do not understand the problems with this JAR file. If  
>>>>> somebody
>> really
>>>>> wants to have smaller files, one could use some tools, that do it
>>>>> automatically on class usage.
>>>> I personally have a couple of usecases for that as I have to work  
>>>> in
>>>> very limited environments. Imagine embedded systems or mobile  
>>>> phones
>>>> where 500 kb is a lot. if you realy need the analyzer you can  
>>>> include
>>>> the additional jar.
>>> Jar Jar Links - special tools for special tasks.
>>>
>> Well, you could do that by hand too, no question. unzip -> zip ->
>> done. But this is not about "is it possible to do it?!" its about
>> providing a packaging that is convinient for users. I would vote for
>> an extra jar file as it is only packaging.
>>
>> simon
>>> --
>>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>>> ICQ: 104465785
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: 3MB lucene-analyzers.jar?

Posted by Simon Willnauer <si...@googlemail.com>.
2009/6/21 Uwe Schindler <uw...@thetaphi.de>:
> One addition: The problem with packaging 2 jars of one contrib is also, that
> all JAR files of Lucene are also generated as Maven artifacts. To generate
> the correct Metadata and so on, there is more to be done than just changing
> the build.xml.
>
> If we want this separate, the simpliest is to put in an own contrib module,
> e.g. contrib-analyzers-cn
that is true though, while I'm not an maven expert is there a way to
have 2 artifacts from one contrib?!

simon
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>> Sent: Sunday, June 21, 2009 6:37 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: 3MB lucene-analyzers.jar?
>>
>> On Sun, Jun 21, 2009 at 6:24 PM, Earwin Burrfoot<ea...@gmail.com> wrote:
>> >>> But: I do not understand the problems with this JAR file. If somebody
>> really
>> >>> wants to have smaller files, one could use some tools, that do it
>> >>> automatically on class usage.
>> >> I personally have a couple of usecases for that as I have to work in
>> >> very limited environments. Imagine embedded systems or mobile phones
>> >> where 500 kb is a lot. if you realy need the analyzer you can include
>> >> the additional jar.
>> > Jar Jar Links - special tools for special tasks.
>> >
>> Well, you could do that by hand too, no question. unzip -> zip ->
>> done. But this is not about "is it possible to do it?!" its about
>> providing a packaging that is convinient for users. I would vote for
>> an extra jar file as it is only packaging.
>>
>> simon
>> > --
>> > Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>> > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>> > ICQ: 104465785
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: 3MB lucene-analyzers.jar?

Posted by Uwe Schindler <uw...@thetaphi.de>.
One addition: The problem with packaging 2 jars of one contrib is also, that
all JAR files of Lucene are also generated as Maven artifacts. To generate
the correct Metadata and so on, there is more to be done than just changing
the build.xml.

If we want this separate, the simpliest is to put in an own contrib module,
e.g. contrib-analyzers-cn

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> Sent: Sunday, June 21, 2009 6:37 PM
> To: java-dev@lucene.apache.org
> Subject: Re: 3MB lucene-analyzers.jar?
> 
> On Sun, Jun 21, 2009 at 6:24 PM, Earwin Burrfoot<ea...@gmail.com> wrote:
> >>> But: I do not understand the problems with this JAR file. If somebody
> really
> >>> wants to have smaller files, one could use some tools, that do it
> >>> automatically on class usage.
> >> I personally have a couple of usecases for that as I have to work in
> >> very limited environments. Imagine embedded systems or mobile phones
> >> where 500 kb is a lot. if you realy need the analyzer you can include
> >> the additional jar.
> > Jar Jar Links - special tools for special tasks.
> >
> Well, you could do that by hand too, no question. unzip -> zip ->
> done. But this is not about "is it possible to do it?!" its about
> providing a packaging that is convinient for users. I would vote for
> an extra jar file as it is only packaging.
> 
> simon
> > --
> > Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> > ICQ: 104465785
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: 3MB lucene-analyzers.jar?

Posted by Simon Willnauer <si...@googlemail.com>.
On Sun, Jun 21, 2009 at 6:24 PM, Earwin Burrfoot<ea...@gmail.com> wrote:
>>> But: I do not understand the problems with this JAR file. If somebody really
>>> wants to have smaller files, one could use some tools, that do it
>>> automatically on class usage.
>> I personally have a couple of usecases for that as I have to work in
>> very limited environments. Imagine embedded systems or mobile phones
>> where 500 kb is a lot. if you realy need the analyzer you can include
>> the additional jar.
> Jar Jar Links - special tools for special tasks.
>
Well, you could do that by hand too, no question. unzip -> zip ->
done. But this is not about "is it possible to do it?!" its about
providing a packaging that is convinient for users. I would vote for
an extra jar file as it is only packaging.

simon
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: 3MB lucene-analyzers.jar?

Posted by Earwin Burrfoot <ea...@gmail.com>.
>> But: I do not understand the problems with this JAR file. If somebody really
>> wants to have smaller files, one could use some tools, that do it
>> automatically on class usage.
> I personally have a couple of usecases for that as I have to work in
> very limited environments. Imagine embedded systems or mobile phones
> where 500 kb is a lot. if you realy need the analyzer you can include
> the additional jar.
Jar Jar Links - special tools for special tasks.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: 3MB lucene-analyzers.jar?

Posted by Simon Willnauer <si...@googlemail.com>.
On Sun, Jun 21, 2009 at 4:27 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
> To do this you must also split up the class files, how to do that? So
> package analysis/cn/ to somewhere else? And when doing this, be sure, to
> handle the resources folder correctly, because some stopwords.txt files for
> some analyzers and the big data files must also be split (same split with
> file path).
sure
>
> And: contrib's build.xml files do not contain a jar task, they use it from
> the contrib-build.xml one directory above, so the changes will be even more
> complicated than splitting into two contribs.
>
will override it.
> But: I do not understand the problems with this JAR file. If somebody really
> wants to have smaller files, one could use some tools, that do it
> automatically on class usage.
>
I personally have a couple of usecases for that as I have to work in
very limited environments. Imagine embedded systems or mobile phones
where 500 kb is a lot. if you realy need the analyzer you can include
the additional jar.

simon
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>> Sent: Sunday, June 21, 2009 4:15 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: 3MB lucene-analyzers.jar?
>>
>> What if we just build two jars? I guess it would make sense to keep it
>> in this contrib module.
>> I would do the changes to build.xml.
>>
>> simon
>>
>> On Sun, Jun 21, 2009 at 4:11 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
>> > Hi Grant,
>> >
>> > I think Ryan means that the analyzer.jar file is 3.5 MB and he would
>> like to
>> > split it up into two different ones. With our current architecture, this
>> > would only be possible, if the SmartChineseAnalyzer moves into its own
>> > contrib. This analyzer needs a lot of data files in the classpath (see
>> > LUCENE-1629).
>> >
>> > Uwe
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >> -----Original Message-----
>> >> From: Grant Ingersoll [mailto:gsingers@apache.org]
>> >> Sent: Sunday, June 21, 2009 2:32 PM
>> >> To: java-dev@lucene.apache.org
>> >> Subject: Re: 3MB lucene-analyzers.jar?
>> >>
>> >> contrib/analyzers is already it's own jar, or am I missing something?
>> >>
>> >> On Jun 21, 2009, at 1:10 AM, Ryan McKinley wrote:
>> >>
>> >> > With the added analyzer for LUCENE-1629, it seems the jar file is
>> >> > now ~3.5MB.
>> >> >
>> >> > Given the size, does it make sense to put it in its own jar file?
>> >> > That way programs can easily exclude it if space is a concern.
>> >> >
>> >> > thanks
>> >> > ryan
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >
>> >>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: 3MB lucene-analyzers.jar?

Posted by Uwe Schindler <uw...@thetaphi.de>.
To do this you must also split up the class files, how to do that? So
package analysis/cn/ to somewhere else? And when doing this, be sure, to
handle the resources folder correctly, because some stopwords.txt files for
some analyzers and the big data files must also be split (same split with
file path).

And: contrib's build.xml files do not contain a jar task, they use it from
the contrib-build.xml one directory above, so the changes will be even more
complicated than splitting into two contribs.

But: I do not understand the problems with this JAR file. If somebody really
wants to have smaller files, one could use some tools, that do it
automatically on class usage.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> Sent: Sunday, June 21, 2009 4:15 PM
> To: java-dev@lucene.apache.org
> Subject: Re: 3MB lucene-analyzers.jar?
> 
> What if we just build two jars? I guess it would make sense to keep it
> in this contrib module.
> I would do the changes to build.xml.
> 
> simon
> 
> On Sun, Jun 21, 2009 at 4:11 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
> > Hi Grant,
> >
> > I think Ryan means that the analyzer.jar file is 3.5 MB and he would
> like to
> > split it up into two different ones. With our current architecture, this
> > would only be possible, if the SmartChineseAnalyzer moves into its own
> > contrib. This analyzer needs a lot of data files in the classpath (see
> > LUCENE-1629).
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >> -----Original Message-----
> >> From: Grant Ingersoll [mailto:gsingers@apache.org]
> >> Sent: Sunday, June 21, 2009 2:32 PM
> >> To: java-dev@lucene.apache.org
> >> Subject: Re: 3MB lucene-analyzers.jar?
> >>
> >> contrib/analyzers is already it's own jar, or am I missing something?
> >>
> >> On Jun 21, 2009, at 1:10 AM, Ryan McKinley wrote:
> >>
> >> > With the added analyzer for LUCENE-1629, it seems the jar file is
> >> > now ~3.5MB.
> >> >
> >> > Given the size, does it make sense to put it in its own jar file?
> >> > That way programs can easily exclude it if space is a concern.
> >> >
> >> > thanks
> >> > ryan
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >> >
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: 3MB lucene-analyzers.jar?

Posted by Simon Willnauer <si...@googlemail.com>.
What if we just build two jars? I guess it would make sense to keep it
in this contrib module.
I would do the changes to build.xml.

simon

On Sun, Jun 21, 2009 at 4:11 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
> Hi Grant,
>
> I think Ryan means that the analyzer.jar file is 3.5 MB and he would like to
> split it up into two different ones. With our current architecture, this
> would only be possible, if the SmartChineseAnalyzer moves into its own
> contrib. This analyzer needs a lot of data files in the classpath (see
> LUCENE-1629).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:gsingers@apache.org]
>> Sent: Sunday, June 21, 2009 2:32 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: 3MB lucene-analyzers.jar?
>>
>> contrib/analyzers is already it's own jar, or am I missing something?
>>
>> On Jun 21, 2009, at 1:10 AM, Ryan McKinley wrote:
>>
>> > With the added analyzer for LUCENE-1629, it seems the jar file is
>> > now ~3.5MB.
>> >
>> > Given the size, does it make sense to put it in its own jar file?
>> > That way programs can easily exclude it if space is a concern.
>> >
>> > thanks
>> > ryan
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: 3MB lucene-analyzers.jar?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Grant,

I think Ryan means that the analyzer.jar file is 3.5 MB and he would like to
split it up into two different ones. With our current architecture, this
would only be possible, if the SmartChineseAnalyzer moves into its own
contrib. This analyzer needs a lot of data files in the classpath (see
LUCENE-1629).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Sunday, June 21, 2009 2:32 PM
> To: java-dev@lucene.apache.org
> Subject: Re: 3MB lucene-analyzers.jar?
> 
> contrib/analyzers is already it's own jar, or am I missing something?
> 
> On Jun 21, 2009, at 1:10 AM, Ryan McKinley wrote:
> 
> > With the added analyzer for LUCENE-1629, it seems the jar file is
> > now ~3.5MB.
> >
> > Given the size, does it make sense to put it in its own jar file?
> > That way programs can easily exclude it if space is a concern.
> >
> > thanks
> > ryan
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: 3MB lucene-analyzers.jar?

Posted by Grant Ingersoll <gs...@apache.org>.
contrib/analyzers is already it's own jar, or am I missing something?

On Jun 21, 2009, at 1:10 AM, Ryan McKinley wrote:

> With the added analyzer for LUCENE-1629, it seems the jar file is  
> now ~3.5MB.
>
> Given the size, does it make sense to put it in its own jar file?
> That way programs can easily exclude it if space is a concern.
>
> thanks
> ryan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org