You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dvora <ba...@gmail.com> on 2009/09/10 08:39:08 UTC

Question regarding the index files

Hello,

I'm coping a question I've asked in the Users lists, but I think it requires
some patching effort, so maybe that list will be more suitable. The question
is as follow.

I'm using Lucene2.4. I'm developing a web application that using Lucene (via
compass) to do the searches. 
I'm intending to deploy the application in Google App Engine
(http://code.google.com/appengine/), which limits files length to be smaller
than 10MB. I've read about the various policies supported by Lucene to limit
the file sizes, but on matter which policy I used and which parameters, the
index files still grew to be lot more the 10MB. Looking at the code, I've
managed to limit the cfs files (predicting the file size in
CompoundFileWriter before closing the file) - I guess that will degrade
performance, but it's OK for now. But now the FDT files are becoming huge
(about 60MB) and I cant identifiy a way to limit those files. 

Is there some built-in and correct way to limit these files length? If no,
can someone direct me please how should I tweak the source code to achieve
that? 

Thanks for any help.
-- 
View this message in context: http://www.nabble.com/Question-regarding-the-index-files-tp25378103p25378103.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: LowerCaseFilter, is there a reason why the class is final?

Posted by Uwe Schindler <uw...@thetaphi.de>.
I forget, this known as "Decorator Pattern":
http://en.wikipedia.org/wiki/Decorator_pattern

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Thursday, September 10, 2009 4:09 PM
> To: java-dev@lucene.apache.org
> Subject: RE: LowerCaseFilter, is there a reason why the class is final?
> 
> See https://issues.apache.org/jira/browse/LUCENE-1753
> 
> In general, if you want to add functionality plug another filter into the
> chain. At least the implementations should be final (next/incrementToken).
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Daniel Shane [mailto:shaned@LEXUM.UMontreal.CA]
> > Sent: Thursday, September 10, 2009 4:06 PM
> > To: java-dev@lucene.apache.org
> > Subject: LowerCaseFilter, is there a reason why the class is final?
> >
> > Hi all,
> >
> > I was wondering why the LowerCaseFilter is declared final? In my code, I
> > would like to extend it but apparently its not possible. I'm just
> > wondering why extending this type of class is considered evil?
> >
> > Daniel Shane
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: LowerCaseFilter, is there a reason why the class is final?

Posted by Ted Dunning <te...@gmail.com>.
Lucene has always had way too much use of final for my taste.  I have often
had to resort of chicanery to get around this.

In many cases, there wasn't even an alternative abstract class.  Writing
test cases involving IndexWriters was a case that sticks in my memory.

On Fri, Sep 11, 2009 at 7:13 AM, Mark Miller <ma...@gmail.com> wrote:

> I think thats true - but its also interesting to note: LowerCaseFilter
> has been final since it was put into svn in 01.
>



-- 
Ted Dunning, CTO
DeepDyve

Re: LowerCaseFilter, is there a reason why the class is final?

Posted by Mark Miller <ma...@gmail.com>.
I think thats true - but its also interesting to note: LowerCaseFilter
has been final since it was put into svn in 01.

-- 
- Mark

http://www.lucidimagination.com



Ted Dunning wrote:
>
> Copy/paste.  Clearly Uwe and others were worried that users wouldn't
> be able to extend these classes compatibly. 
>
> My own opinion is that this causes worse problems with back
> compatibility because people wind up copying code instead of calling
> it.  You may be able to extend an abstract class to minimize your work.
>
> On Fri, Sep 11, 2009 at 5:33 AM, Daniel Shane
> <shaned@lexum.umontreal.ca <ma...@lexum.umontreal.ca>> wrote:
>
>     Does anyone else see a way of doing this that is simple?
>
>
>
>
> -- 
> Ted Dunning, CTO
> DeepDyve
>





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: LowerCaseFilter, is there a reason why the class is final?

Posted by Daniel Shane <sh...@LEXUM.UMontreal.CA>.
IMHO, if I'm forced to write a by-pass filter to re-use a filter instead 
of copy/pasting it, I think we are getting way off the Decorator 
Pattern. Its not simple anymore. I bet you have 9 chances out of 10 that 
a dev. will copy/paste that code before writing a by-pass filter.

Extending the functionality of a filter should not be something 
difficult. And having everyone write their own bypass filter seems 
really annoying. Imagine all those people having to write the by-pass 
filter.

We should include such a filter in Lucene natively and add in the 
JavaDocs of the filter the mention that you can extend them with it to 
avoid people copy/pasting code.

If you want I can cook up a draft to get things started.

Daniel Shane

Ted Dunning wrote:
>
> Copy/paste.  Clearly Uwe and others were worried that users wouldn't 
> be able to extend these classes compatibly. 
>
> My own opinion is that this causes worse problems with back 
> compatibility because people wind up copying code instead of calling 
> it.  You may be able to extend an abstract class to minimize your work.
>
> On Fri, Sep 11, 2009 at 5:33 AM, Daniel Shane 
> <shaned@lexum.umontreal.ca <ma...@lexum.umontreal.ca>> wrote:
>
>     Does anyone else see a way of doing this that is simple?
>
>
>
>
> -- 
> Ted Dunning, CTO
> DeepDyve
>


Re: LowerCaseFilter, is there a reason why the class is final?

Posted by Ted Dunning <te...@gmail.com>.
Copy/paste.  Clearly Uwe and others were worried that users wouldn't be able
to extend these classes compatibly.

My own opinion is that this causes worse problems with back compatibility
because people wind up copying code instead of calling it.  You may be able
to extend an abstract class to minimize your work.

On Fri, Sep 11, 2009 at 5:33 AM, Daniel Shane <sh...@lexum.umontreal.ca>wrote:

> Does anyone else see a way of doing this that is simple?




-- 
Ted Dunning, CTO
DeepDyve

RE: LowerCaseFilter, is there a reason why the class is final?

Posted by Uwe Schindler <uw...@thetaphi.de>.
> The only thing I can do is add a filter before the LowerCaseFilter that
> would pass all the non-word tokens to the next filter, but it seems really
> complicated for a case where a simple extend would do the job.

This is the way to go!

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: LowerCaseFilter, is there a reason why the class is final?

Posted by Daniel Shane <sh...@LEXUM.UMontreal.CA>.
With the current API of TokenStream (incrementToken) I really do not see 
how I could do the following scenario with the Decorator Pattern :

I have a case where I would like to execute the LowerCaseFilter only if 
the token is of type "word". I do not want to execute the 
LowerCaseFilter if the token is of type "number" for example.

Unfortunately, I don't see how this is possible using the current API 
and the fact that the filters are final.

The only thing I can do is add a filter before the LowerCaseFilter that 
would pass all the non-word tokens to the next filter, but it seems 
really complicated for a case where a simple extend would do the job.

Or, if the API had two methods, one which increments the stream and 
another which process the current "token" or attributes then it would be 
possible to do what I have in mind using the Decorator Pattern.

Does anyone else see a way of doing this that is simple?

Daniel Shane

Uwe Schindler wrote:
> See https://issues.apache.org/jira/browse/LUCENE-1753
>
> In general, if you want to add functionality plug another filter into the
> chain. At least the implementations should be final (next/incrementToken).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>   
>> -----Original Message-----
>> From: Daniel Shane [mailto:shaned@LEXUM.UMontreal.CA]
>> Sent: Thursday, September 10, 2009 4:06 PM
>> To: java-dev@lucene.apache.org
>> Subject: LowerCaseFilter, is there a reason why the class is final?
>>
>> Hi all,
>>
>> I was wondering why the LowerCaseFilter is declared final? In my code, I
>> would like to extend it but apparently its not possible. I'm just
>> wondering why extending this type of class is considered evil?
>>
>> Daniel Shane
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


RE: LowerCaseFilter, is there a reason why the class is final?

Posted by Uwe Schindler <uw...@thetaphi.de>.
See https://issues.apache.org/jira/browse/LUCENE-1753

In general, if you want to add functionality plug another filter into the
chain. At least the implementations should be final (next/incrementToken).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Daniel Shane [mailto:shaned@LEXUM.UMontreal.CA]
> Sent: Thursday, September 10, 2009 4:06 PM
> To: java-dev@lucene.apache.org
> Subject: LowerCaseFilter, is there a reason why the class is final?
> 
> Hi all,
> 
> I was wondering why the LowerCaseFilter is declared final? In my code, I
> would like to extend it but apparently its not possible. I'm just
> wondering why extending this type of class is considered evil?
> 
> Daniel Shane
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


LowerCaseFilter, is there a reason why the class is final?

Posted by Daniel Shane <sh...@LEXUM.UMontreal.CA>.
Hi all,

I was wondering why the LowerCaseFilter is declared final? In my code, I 
would like to extend it but apparently its not possible. I'm just 
wondering why extending this type of class is considered evil?

Daniel Shane

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Question regarding the index files

Posted by Michael McCandless <lu...@mikemccandless.com>.
I answered on java-user.  I think it should be able to be done w/o
source code changes to Lucene.

Mike

On Thu, Sep 10, 2009 at 2:39 AM, Dvora <ba...@gmail.com> wrote:
>
> Hello,
>
> I'm coping a question I've asked in the Users lists, but I think it requires
> some patching effort, so maybe that list will be more suitable. The question
> is as follow.
>
> I'm using Lucene2.4. I'm developing a web application that using Lucene (via
> compass) to do the searches.
> I'm intending to deploy the application in Google App Engine
> (http://code.google.com/appengine/), which limits files length to be smaller
> than 10MB. I've read about the various policies supported by Lucene to limit
> the file sizes, but on matter which policy I used and which parameters, the
> index files still grew to be lot more the 10MB. Looking at the code, I've
> managed to limit the cfs files (predicting the file size in
> CompoundFileWriter before closing the file) - I guess that will degrade
> performance, but it's OK for now. But now the FDT files are becoming huge
> (about 60MB) and I cant identifiy a way to limit those files.
>
> Is there some built-in and correct way to limit these files length? If no,
> can someone direct me please how should I tweak the source code to achieve
> that?
>
> Thanks for any help.
> --
> View this message in context: http://www.nabble.com/Question-regarding-the-index-files-tp25378103p25378103.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org