You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Fraschetti <fr...@gmail.com> on 2004/10/01 03:24:42 UTC

BooleanQuery - Too Many Clases on date range.

I recently read in regards to my problem that date_field:[0820483200
TO 1104480000]
is evluated into a series of boolean queries ... which has a cap of
1024 ... considering my documents will have dates spanning over many
years, and i need the granualirity of 'by day' searching, are there
any reccomendations on how to make this work?

Currently with query: +content_field:sometext +date_field:[0820483200
TO 1104480000]
I get the following exception:
org.apache.lucene.search.BooleanQuery$TooManyClauses


any suggestions on how I can still keep the granuality of by day, but
without limiting my search results? Are there any date formats that I
can change those numbers to that would allow me to complete the search
(i.e.  Feb, 15 2004 ) .. can lucene's range do a proper search on
formatted dates?

Is there a combination of RangeQuery and Query/MultiTermQuery that I can use?

your help is greatly appreciated.


-- 
___________________________________________________
Chris Fraschetti
e fraschetti@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Oct 4, 2004, at 2:12 PM, Chris Fraschetti wrote:

> absoultely, limiting the user's query is no problem here. I've
> currently implemented the lucene javascript to catcha lot of user
> quries that could cause issues.. blank queries, ? or * at the
> beginning of query, etc etc... but I couldn't think of a way to
> prevent the user from doing a*  but not   comment*   wanting comments
> or commentary...  any suggestions would be warmly welcomed.
>

I recommend subclassing QueryParser, and overriding getPrefixQuery and 
getWildcardQuery.  In both of the overridden methods, throw a 
ParseException.  You should be handling ParseException gracefully 
somehow already, so that should do the trick.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Sergiu Gordea <gs...@ifit.uni-klu.ac.at>.
Chris Fraschetti wrote:

>absoultely, limiting the user's query is no problem here. I've
>currently implemented the lucene javascript to catcha lot of user
>quries that could cause issues.. blank queries, ? or * at the
>beginning of query, etc etc... but I couldn't think of a way to
>prevent the user from doing a*  but not   comment*   wanting comments
>or commentary...  any suggestions would be warmly welcomed.
>
>  
>
One cheap solution is to ask the user to enter at least 3 alfa-numerical 
chars.
What do you say about that?

  All the best,

  Sergiu

>On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
><va...@cirano.qc.ca> wrote:
>  
>
>>Ok, got it, got a small comment though.
>>
>>For large wildcard queries, please note that google does not support wild
>>cards. Search hell*, and there will be no correct matches with hello.
>>
>>Is there a reason why you wish to allow such large queries? We might
>>be able to find alternative ways of helping you out. No one will use a
>>query a*. If someone does, the results would be completely meaningless
>>(many false positives for a user). However a query like program* might be
>>interesting to a user.
>>
>>The problem with hacking term expansion is that the rules of this
>>expansion might be hard to define (as is maybe one should use the
>>first, the most frequent terms or the even the least frequent, depending
>>on your app).
>>
>>sv
>>
>>On Mon, 4 Oct 2004, Chris Fraschetti wrote:
>>
>>    
>>
>>>The date portion of my code works great now.. no problems there, so
>>>      
>>>
>>    
>>
>>>let me thank you now for your date filter solution... but my current
>>>problem is in regards to a stand alone....   a*     query giving me
>>>the too many clauses exception....
>>>
>>>
>>>On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
>>><va...@cirano.qc.ca> wrote:
>>>      
>>>
>>>>BTW, what's wrong with the DateFilter solution, I mentionned earlier?
>>>>
>>>>I've used it before (before lucene-1.4 though) without memory problems,
>>>>thus I always assumed that it avoided the allocation problems with prefix
>>>>queries.
>>>>
>>>>sv
>>>>
>>>>
>>>>
>>>>On Mon, 4 Oct 2004, Chris Fraschetti wrote:
>>>>
>>>>        
>>>>
>>>>>Surely some folks out there have used lucene on a large scale and have
>>>>>had to compensate for this somehow, any other solutions? Morus, thank
>>>>>you very more for your imput, and I am looking into your solution,
>>>>>just putting my feelers out there once more.
>>>>>
>>>>>The lucene API is very limited as to it's descriptions of it's
>>>>>components, short of digging into the code, is there a good doc
>>>>>somewhere out there that explains the workins of lucene?
>>>>>
>>>>>
>>>>>On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
>>>>><fr...@gmail.com> wrote:
>>>>>          
>>>>>
>>>>>>So before I spend a significant amount of time digging into the lucene
>>>>>>code, how does your experience with lucene give light to my
>>>>>>situation....  Our current index is pretty huge, and with each
>>>>>>increase in side i've had, i've experienced a problem like this...
>>>>>>Without taking up too much of your time.. because obviously this i my
>>>>>>task, I thought i'd ask you if you'd had any experience with this
>>>>>>boolean clause nonsense...  of course it can be overcome, but if you
>>>>>>know a quick hack, awesome, otherwise.. no big, but off to work i go
>>>>>>:)
>>>>>>
>>>>>>-Fraschetti
>>>>>>
>>>>>>
>>>>>>---------- Forwarded message ----------
>>>>>>From: Morus Walter <mo...@tanto.de>
>>>>>>Date: Mon, 4 Oct 2004 09:01:50 +0200
>>>>>>Subject: Re: BooleanQuery - Too Many Clases on date range.
>>>>>>To: Lucene Users List <lu...@jakarta.apache.org>, Chris
>>>>>>Fraschetti <fr...@gmail.com>
>>>>>>
>>>>>>Chris Fraschetti writes:
>>>>>>            
>>>>>>
>>>>>>>So i decicded to move my epoch date to the  20040608 date which fixed
>>>>>>>my boolean query problem in regards to my current data size (approx
>>>>>>>600,000) ....
>>>>>>>
>>>>>>>but now as soon as I do a query like ...      a*
>>>>>>>I get the boolean error again. Google obviously can handle this query,
>>>>>>>and I'm pretty sure lucene can handle it.. any ideas? With out
>>>>>>>without a date dange specified i still get the  TooManyClauses error.
>>>>>>>              
>>>>>>>
>>>>>>            
>>>>>>
>>>>>>>I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
>>>>>>>a out of memory error. Is this b/c the boolean search tried to
>>>>>>>allocate that many clauses by default or because my query actually
>>>>>>>needed that many clauses?
>>>>>>>              
>>>>>>>
>>>>>>boolean search allocates clauses for all tokens having the prefix or
>>>>>>matching the wildcard expression.
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Why does it work on small indexes but not
>>>>>>>large?
>>>>>>>              
>>>>>>>
>>>>>>Because there are fewer tokens starting with a.
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Is there any way to have the parser create as many clauses as
>>>>>>>it can and then search with what it has? w/o recompiling the source?
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>You need to create your own version of Wildcard- and Prefix-Query
>>>>>>that takes a maximum term number and ignores further clauses.
>>>>>>And you need a variant of the query parser that uses these queries.
>>>>>>
>>>>>>This can be done, even without recompiling lucene, but you will have to
>>>>>>do some programming at the level of lucene queries.
>>>>>>Shouldn't be hard, since you can use the sources as a starting point.
>>>>>>
>>>>>>I guess this does not exist because the lucene developer decided to prefer
>>>>>>a query error rather than uncomplete results.
>>>>>>
>>>>>>Morus
>>>>>>
>>>>>>
>>>>>>--
>>>>>>___________________________________________________
>>>>>>Chris Fraschetti, Student CompSci System Admin
>>>>>>University of San Francisco
>>>>>>e fraschetti@gmail.com | http://meteora.cs.usfca.edu
>>>>>>
>>>>>>            
>>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>        
>>>>
>>>
>>>
>>>      
>>>
>>    
>>
>
>
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
I've used the simple message that the user's request was too vague and
that he should modify it. I haven't had too many complaints about this
especially when I explained why to a client:

If one user of many does a*, the whole system will grind to a halt as that
one request will use up all of the available memory (wildcards aren't very
scalable...).

Here is an example of a working system:
http://theserverside.com/search/search.tss

I don't know if many people complain that when they do a*, that no results
appear, but a request for javap* returns javapro, javaplus, javapolis...

HTH,
sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:

> absoultely, limiting the user's query is no problem here. I've
> currently implemented the lucene javascript to catcha lot of user
> quries that could cause issues.. blank queries, ? or * at the
> beginning of query, etc etc... but I couldn't think of a way to
> prevent the user from doing a*  but not   comment*   wanting comments
> or commentary...  any suggestions would be warmly welcomed.
>
>
> On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
> <va...@cirano.qc.ca> wrote:
> > Ok, got it, got a small comment though.
> >
> > For large wildcard queries, please note that google does not support wild
> > cards. Search hell*, and there will be no correct matches with hello.
> >
> > Is there a reason why you wish to allow such large queries? We might
> > be able to find alternative ways of helping you out. No one will use a
> > query a*. If someone does, the results would be completely meaningless
> > (many false positives for a user). However a query like program* might be
> > interesting to a user.
> >
> > The problem with hacking term expansion is that the rules of this
> > expansion might be hard to define (as is maybe one should use the
> > first, the most frequent terms or the even the least frequent, depending
> > on your app).
> >
> > sv
> >
> > On Mon, 4 Oct 2004, Chris Fraschetti wrote:
> >
> > > The date portion of my code works great now.. no problems there, so
> >
> >
> > > let me thank you now for your date filter solution... but my current
> > > problem is in regards to a stand alone....   a*     query giving me
> > > the too many clauses exception....
> > >
> > >
> > > On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
> > > <va...@cirano.qc.ca> wrote:
> > > > BTW, what's wrong with the DateFilter solution, I mentionned earlier?
> > > >
> > > > I've used it before (before lucene-1.4 though) without memory problems,
> > > > thus I always assumed that it avoided the allocation problems with prefix
> > > > queries.
> > > >
> > > > sv
> > > >
> > > >
> > > >
> > > > On Mon, 4 Oct 2004, Chris Fraschetti wrote:
> > > >
> > > > > Surely some folks out there have used lucene on a large scale and have
> > > > > had to compensate for this somehow, any other solutions? Morus, thank
> > > > > you very more for your imput, and I am looking into your solution,
> > > > > just putting my feelers out there once more.
> > > > >
> > > > > The lucene API is very limited as to it's descriptions of it's
> > > > > components, short of digging into the code, is there a good doc
> > > > > somewhere out there that explains the workins of lucene?
> > > > >
> > > > >
> > > > > On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
> > > > > <fr...@gmail.com> wrote:
> > > > > > So before I spend a significant amount of time digging into the lucene
> > > > > > code, how does your experience with lucene give light to my
> > > > > > situation....  Our current index is pretty huge, and with each
> > > > > > increase in side i've had, i've experienced a problem like this...
> > > > > > Without taking up too much of your time.. because obviously this i my
> > > > > > task, I thought i'd ask you if you'd had any experience with this
> > > > > > boolean clause nonsense...  of course it can be overcome, but if you
> > > > > > know a quick hack, awesome, otherwise.. no big, but off to work i go
> > > > > > :)
> > > > > >
> > > > > > -Fraschetti
> > > > > >
> > > > > >
> > > > > > ---------- Forwarded message ----------
> > > > > > From: Morus Walter <mo...@tanto.de>
> > > > > > Date: Mon, 4 Oct 2004 09:01:50 +0200
> > > > > > Subject: Re: BooleanQuery - Too Many Clases on date range.
> > > > > > To: Lucene Users List <lu...@jakarta.apache.org>, Chris
> > > > > > Fraschetti <fr...@gmail.com>
> > > > > >
> > > > > > Chris Fraschetti writes:
> > > > > > > So i decicded to move my epoch date to the  20040608 date which fixed
> > > > > > > my boolean query problem in regards to my current data size (approx
> > > > > > > 600,000) ....
> > > > > > >
> > > > > > > but now as soon as I do a query like ...      a*
> > > > > > > I get the boolean error again. Google obviously can handle this query,
> > > > > > > and I'm pretty sure lucene can handle it.. any ideas? With out
> > > > > > > without a date dange specified i still get the  TooManyClauses error.
> > > > > >
> > > > > >
> > > > > > > I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
> > > > > > > a out of memory error. Is this b/c the boolean search tried to
> > > > > > > allocate that many clauses by default or because my query actually
> > > > > > > needed that many clauses?
> > > > > >
> > > > > > boolean search allocates clauses for all tokens having the prefix or
> > > > > > matching the wildcard expression.
> > > > > >
> > > > > > > Why does it work on small indexes but not
> > > > > > > large?
> > > > > > Because there are fewer tokens starting with a.
> > > > > >
> > > > > > > Is there any way to have the parser create as many clauses as
> > > > > > > it can and then search with what it has? w/o recompiling the source?
> > > > > > >
> > > > > > You need to create your own version of Wildcard- and Prefix-Query
> > > > > > that takes a maximum term number and ignores further clauses.
> > > > > > And you need a variant of the query parser that uses these queries.
> > > > > >
> > > > > > This can be done, even without recompiling lucene, but you will have to
> > > > > > do some programming at the level of lucene queries.
> > > > > > Shouldn't be hard, since you can use the sources as a starting point.
> > > > > >
> > > > > > I guess this does not exist because the lucene developer decided to prefer
> > > > > > a query error rather than uncomplete results.
> > > > > >
> > > > > > Morus
> > > > > >
> > > > > >
> > > > > > --
> > > > > > ___________________________________________________
> > > > > > Chris Fraschetti, Student CompSci System Admin
> > > > > > University of San Francisco
> > > > > > e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
> >
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Chris Fraschetti <fr...@gmail.com>.
absoultely, limiting the user's query is no problem here. I've
currently implemented the lucene javascript to catcha lot of user
quries that could cause issues.. blank queries, ? or * at the
beginning of query, etc etc... but I couldn't think of a way to
prevent the user from doing a*  but not   comment*   wanting comments
or commentary...  any suggestions would be warmly welcomed.


On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
<va...@cirano.qc.ca> wrote:
> Ok, got it, got a small comment though.
> 
> For large wildcard queries, please note that google does not support wild
> cards. Search hell*, and there will be no correct matches with hello.
> 
> Is there a reason why you wish to allow such large queries? We might
> be able to find alternative ways of helping you out. No one will use a
> query a*. If someone does, the results would be completely meaningless
> (many false positives for a user). However a query like program* might be
> interesting to a user.
> 
> The problem with hacking term expansion is that the rules of this
> expansion might be hard to define (as is maybe one should use the
> first, the most frequent terms or the even the least frequent, depending
> on your app).
> 
> sv
> 
> On Mon, 4 Oct 2004, Chris Fraschetti wrote:
> 
> > The date portion of my code works great now.. no problems there, so
> 
> 
> > let me thank you now for your date filter solution... but my current
> > problem is in regards to a stand alone....   a*     query giving me
> > the too many clauses exception....
> >
> >
> > On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
> > <va...@cirano.qc.ca> wrote:
> > > BTW, what's wrong with the DateFilter solution, I mentionned earlier?
> > >
> > > I've used it before (before lucene-1.4 though) without memory problems,
> > > thus I always assumed that it avoided the allocation problems with prefix
> > > queries.
> > >
> > > sv
> > >
> > >
> > >
> > > On Mon, 4 Oct 2004, Chris Fraschetti wrote:
> > >
> > > > Surely some folks out there have used lucene on a large scale and have
> > > > had to compensate for this somehow, any other solutions? Morus, thank
> > > > you very more for your imput, and I am looking into your solution,
> > > > just putting my feelers out there once more.
> > > >
> > > > The lucene API is very limited as to it's descriptions of it's
> > > > components, short of digging into the code, is there a good doc
> > > > somewhere out there that explains the workins of lucene?
> > > >
> > > >
> > > > On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
> > > > <fr...@gmail.com> wrote:
> > > > > So before I spend a significant amount of time digging into the lucene
> > > > > code, how does your experience with lucene give light to my
> > > > > situation....  Our current index is pretty huge, and with each
> > > > > increase in side i've had, i've experienced a problem like this...
> > > > > Without taking up too much of your time.. because obviously this i my
> > > > > task, I thought i'd ask you if you'd had any experience with this
> > > > > boolean clause nonsense...  of course it can be overcome, but if you
> > > > > know a quick hack, awesome, otherwise.. no big, but off to work i go
> > > > > :)
> > > > >
> > > > > -Fraschetti
> > > > >
> > > > >
> > > > > ---------- Forwarded message ----------
> > > > > From: Morus Walter <mo...@tanto.de>
> > > > > Date: Mon, 4 Oct 2004 09:01:50 +0200
> > > > > Subject: Re: BooleanQuery - Too Many Clases on date range.
> > > > > To: Lucene Users List <lu...@jakarta.apache.org>, Chris
> > > > > Fraschetti <fr...@gmail.com>
> > > > >
> > > > > Chris Fraschetti writes:
> > > > > > So i decicded to move my epoch date to the  20040608 date which fixed
> > > > > > my boolean query problem in regards to my current data size (approx
> > > > > > 600,000) ....
> > > > > >
> > > > > > but now as soon as I do a query like ...      a*
> > > > > > I get the boolean error again. Google obviously can handle this query,
> > > > > > and I'm pretty sure lucene can handle it.. any ideas? With out
> > > > > > without a date dange specified i still get the  TooManyClauses error.
> > > > >
> > > > >
> > > > > > I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
> > > > > > a out of memory error. Is this b/c the boolean search tried to
> > > > > > allocate that many clauses by default or because my query actually
> > > > > > needed that many clauses?
> > > > >
> > > > > boolean search allocates clauses for all tokens having the prefix or
> > > > > matching the wildcard expression.
> > > > >
> > > > > > Why does it work on small indexes but not
> > > > > > large?
> > > > > Because there are fewer tokens starting with a.
> > > > >
> > > > > > Is there any way to have the parser create as many clauses as
> > > > > > it can and then search with what it has? w/o recompiling the source?
> > > > > >
> > > > > You need to create your own version of Wildcard- and Prefix-Query
> > > > > that takes a maximum term number and ignores further clauses.
> > > > > And you need a variant of the query parser that uses these queries.
> > > > >
> > > > > This can be done, even without recompiling lucene, but you will have to
> > > > > do some programming at the level of lucene queries.
> > > > > Shouldn't be hard, since you can use the sources as a starting point.
> > > > >
> > > > > I guess this does not exist because the lucene developer decided to prefer
> > > > > a query error rather than uncomplete results.
> > > > >
> > > > > Morus
> > > > >
> > > > >
> > > > > --
> > > > > ___________________________________________________
> > > > > Chris Fraschetti, Student CompSci System Admin
> > > > > University of San Francisco
> > > > > e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> > >
> >
> >
> >
> >
> 
> 



-- 
___________________________________________________
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e fraschetti@gmail.com | http://meteora.cs.usfca.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
Ok, got it, got a small comment though.

For large wildcard queries, please note that google does not support wild
cards. Search hell*, and there will be no correct matches with hello.

Is there a reason why you wish to allow such large queries? We might
be able to find alternative ways of helping you out. No one will use a
query a*. If someone does, the results would be completely meaningless
(many false positives for a user). However a query like program* might be
interesting to a user.

The problem with hacking term expansion is that the rules of this
expansion might be hard to define (as is maybe one should use the
first, the most frequent terms or the even the least frequent, depending
on your app).

sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:

> The date portion of my code works great now.. no problems there, so
> let me thank you now for your date filter solution... but my current
> problem is in regards to a stand alone....   a*     query giving me
> the too many clauses exception....
>
>
> On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
> <va...@cirano.qc.ca> wrote:
> > BTW, what's wrong with the DateFilter solution, I mentionned earlier?
> >
> > I've used it before (before lucene-1.4 though) without memory problems,
> > thus I always assumed that it avoided the allocation problems with prefix
> > queries.
> >
> > sv
> >
> >
> >
> > On Mon, 4 Oct 2004, Chris Fraschetti wrote:
> >
> > > Surely some folks out there have used lucene on a large scale and have
> > > had to compensate for this somehow, any other solutions? Morus, thank
> > > you very more for your imput, and I am looking into your solution,
> > > just putting my feelers out there once more.
> > >
> > > The lucene API is very limited as to it's descriptions of it's
> > > components, short of digging into the code, is there a good doc
> > > somewhere out there that explains the workins of lucene?
> > >
> > >
> > > On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
> > > <fr...@gmail.com> wrote:
> > > > So before I spend a significant amount of time digging into the lucene
> > > > code, how does your experience with lucene give light to my
> > > > situation....  Our current index is pretty huge, and with each
> > > > increase in side i've had, i've experienced a problem like this...
> > > > Without taking up too much of your time.. because obviously this i my
> > > > task, I thought i'd ask you if you'd had any experience with this
> > > > boolean clause nonsense...  of course it can be overcome, but if you
> > > > know a quick hack, awesome, otherwise.. no big, but off to work i go
> > > > :)
> > > >
> > > > -Fraschetti
> > > >
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Morus Walter <mo...@tanto.de>
> > > > Date: Mon, 4 Oct 2004 09:01:50 +0200
> > > > Subject: Re: BooleanQuery - Too Many Clases on date range.
> > > > To: Lucene Users List <lu...@jakarta.apache.org>, Chris
> > > > Fraschetti <fr...@gmail.com>
> > > >
> > > > Chris Fraschetti writes:
> > > > > So i decicded to move my epoch date to the  20040608 date which fixed
> > > > > my boolean query problem in regards to my current data size (approx
> > > > > 600,000) ....
> > > > >
> > > > > but now as soon as I do a query like ...      a*
> > > > > I get the boolean error again. Google obviously can handle this query,
> > > > > and I'm pretty sure lucene can handle it.. any ideas? With out
> > > > > without a date dange specified i still get the  TooManyClauses error.
> > > >
> > > >
> > > > > I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
> > > > > a out of memory error. Is this b/c the boolean search tried to
> > > > > allocate that many clauses by default or because my query actually
> > > > > needed that many clauses?
> > > >
> > > > boolean search allocates clauses for all tokens having the prefix or
> > > > matching the wildcard expression.
> > > >
> > > > > Why does it work on small indexes but not
> > > > > large?
> > > > Because there are fewer tokens starting with a.
> > > >
> > > > > Is there any way to have the parser create as many clauses as
> > > > > it can and then search with what it has? w/o recompiling the source?
> > > > >
> > > > You need to create your own version of Wildcard- and Prefix-Query
> > > > that takes a maximum term number and ignores further clauses.
> > > > And you need a variant of the query parser that uses these queries.
> > > >
> > > > This can be done, even without recompiling lucene, but you will have to
> > > > do some programming at the level of lucene queries.
> > > > Shouldn't be hard, since you can use the sources as a starting point.
> > > >
> > > > I guess this does not exist because the lucene developer decided to prefer
> > > > a query error rather than uncomplete results.
> > > >
> > > > Morus
> > > >
> > > >
> > > > --
> > > > ___________________________________________________
> > > > Chris Fraschetti, Student CompSci System Admin
> > > > University of San Francisco
> > > > e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> > > >
> > >
> > >
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Chris Fraschetti <fr...@gmail.com>.
The date portion of my code works great now.. no problems there, so
let me thank you now for your date filter solution... but my current
problem is in regards to a stand alone....   a*     query giving me
the too many clauses exception....


On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
<va...@cirano.qc.ca> wrote:
> BTW, what's wrong with the DateFilter solution, I mentionned earlier?
> 
> I've used it before (before lucene-1.4 though) without memory problems,
> thus I always assumed that it avoided the allocation problems with prefix
> queries.
> 
> sv
> 
> 
> 
> On Mon, 4 Oct 2004, Chris Fraschetti wrote:
> 
> > Surely some folks out there have used lucene on a large scale and have
> > had to compensate for this somehow, any other solutions? Morus, thank
> > you very more for your imput, and I am looking into your solution,
> > just putting my feelers out there once more.
> >
> > The lucene API is very limited as to it's descriptions of it's
> > components, short of digging into the code, is there a good doc
> > somewhere out there that explains the workins of lucene?
> >
> >
> > On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
> > <fr...@gmail.com> wrote:
> > > So before I spend a significant amount of time digging into the lucene
> > > code, how does your experience with lucene give light to my
> > > situation....  Our current index is pretty huge, and with each
> > > increase in side i've had, i've experienced a problem like this...
> > > Without taking up too much of your time.. because obviously this i my
> > > task, I thought i'd ask you if you'd had any experience with this
> > > boolean clause nonsense...  of course it can be overcome, but if you
> > > know a quick hack, awesome, otherwise.. no big, but off to work i go
> > > :)
> > >
> > > -Fraschetti
> > >
> > >
> > > ---------- Forwarded message ----------
> > > From: Morus Walter <mo...@tanto.de>
> > > Date: Mon, 4 Oct 2004 09:01:50 +0200
> > > Subject: Re: BooleanQuery - Too Many Clases on date range.
> > > To: Lucene Users List <lu...@jakarta.apache.org>, Chris
> > > Fraschetti <fr...@gmail.com>
> > >
> > > Chris Fraschetti writes:
> > > > So i decicded to move my epoch date to the  20040608 date which fixed
> > > > my boolean query problem in regards to my current data size (approx
> > > > 600,000) ....
> > > >
> > > > but now as soon as I do a query like ...      a*
> > > > I get the boolean error again. Google obviously can handle this query,
> > > > and I'm pretty sure lucene can handle it.. any ideas? With out
> > > > without a date dange specified i still get the  TooManyClauses error.
> > >
> > >
> > > > I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
> > > > a out of memory error. Is this b/c the boolean search tried to
> > > > allocate that many clauses by default or because my query actually
> > > > needed that many clauses?
> > >
> > > boolean search allocates clauses for all tokens having the prefix or
> > > matching the wildcard expression.
> > >
> > > > Why does it work on small indexes but not
> > > > large?
> > > Because there are fewer tokens starting with a.
> > >
> > > > Is there any way to have the parser create as many clauses as
> > > > it can and then search with what it has? w/o recompiling the source?
> > > >
> > > You need to create your own version of Wildcard- and Prefix-Query
> > > that takes a maximum term number and ignores further clauses.
> > > And you need a variant of the query parser that uses these queries.
> > >
> > > This can be done, even without recompiling lucene, but you will have to
> > > do some programming at the level of lucene queries.
> > > Shouldn't be hard, since you can use the sources as a starting point.
> > >
> > > I guess this does not exist because the lucene developer decided to prefer
> > > a query error rather than uncomplete results.
> > >
> > > Morus
> > >
> > >
> > > --
> > > ___________________________________________________
> > > Chris Fraschetti, Student CompSci System Admin
> > > University of San Francisco
> > > e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> > >
> >
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 



-- 
___________________________________________________
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e fraschetti@gmail.com | http://meteora.cs.usfca.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Che Dong <ch...@chedong.com>.
How about use inter based filter instead of datatime based filter. 
datetime can convert to unix timestamp for compare.

Thanks

Che Dong
http://www.chedong.com/

Chris Fraschetti wrote:
> Surely some folks out there have used lucene on a large scale and have
> had to compensate for this somehow, any other solutions? Morus, thank
> you very more for your imput, and I am looking into your solution,
> just putting my feelers out there once more.
> 
> The lucene API is very limited as to it's descriptions of it's
> components, short of digging into the code, is there a good doc
> somewhere out there that explains the workins of lucene?
> 
> 
> On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
> <fr...@gmail.com> wrote:
> 
>>So before I spend a significant amount of time digging into the lucene
>>code, how does your experience with lucene give light to my
>>situation....  Our current index is pretty huge, and with each
>>increase in side i've had, i've experienced a problem like this...
>>Without taking up too much of your time.. because obviously this i my
>>task, I thought i'd ask you if you'd had any experience with this
>>boolean clause nonsense...  of course it can be overcome, but if you
>>know a quick hack, awesome, otherwise.. no big, but off to work i go
>>:)
>>
>>-Fraschetti
>>
>>
>>---------- Forwarded message ----------
>>From: Morus Walter <mo...@tanto.de>
>>Date: Mon, 4 Oct 2004 09:01:50 +0200
>>Subject: Re: BooleanQuery - Too Many Clases on date range.
>>To: Lucene Users List <lu...@jakarta.apache.org>, Chris
>>Fraschetti <fr...@gmail.com>
>>
>>Chris Fraschetti writes:
>>
>>>So i decicded to move my epoch date to the  20040608 date which fixed
>>>my boolean query problem in regards to my current data size (approx
>>>600,000) ....
>>>
>>>but now as soon as I do a query like ...      a*
>>>I get the boolean error again. Google obviously can handle this query,
>>>and I'm pretty sure lucene can handle it.. any ideas? With out
>>>without a date dange specified i still get the  TooManyClauses error.
>>
>>
>>>I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
>>>a out of memory error. Is this b/c the boolean search tried to
>>>allocate that many clauses by default or because my query actually
>>>needed that many clauses?
>>
>>boolean search allocates clauses for all tokens having the prefix or
>>matching the wildcard expression.
>>
>>
>>>Why does it work on small indexes but not
>>>large?
>>
>>Because there are fewer tokens starting with a.
>>
>>
>>>Is there any way to have the parser create as many clauses as
>>>it can and then search with what it has? w/o recompiling the source?
>>>
>>
>>You need to create your own version of Wildcard- and Prefix-Query
>>that takes a maximum term number and ignores further clauses.
>>And you need a variant of the query parser that uses these queries.
>>
>>This can be done, even without recompiling lucene, but you will have to
>>do some programming at the level of lucene queries.
>>Shouldn't be hard, since you can use the sources as a starting point.
>>
>>I guess this does not exist because the lucene developer decided to prefer
>>a query error rather than uncomplete results.
>>
>>Morus
>>
>>
>>--
>>___________________________________________________
>>Chris Fraschetti, Student CompSci System Admin
>>University of San Francisco
>>e fraschetti@gmail.com | http://meteora.cs.usfca.edu
>>
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Otis Gospodnetic <ot...@yahoo.com>.
There are some articles about Lucene.  You can find the links on
Lucene's Wiki.  Lucene in Action is almost done:
http://www.manning.com/catalog/view.php?book=hatcher2
I don't think you can pre-order it from the publisher, but you can
probably pre-order it from Amazon.  I don't know of any other good
Lucene documentation.

Otis


--- Chris Fraschetti <fr...@gmail.com> wrote:

> Surely some folks out there have used lucene on a large scale and
> have
> had to compensate for this somehow, any other solutions? Morus, thank
> you very more for your imput, and I am looking into your solution,
> just putting my feelers out there once more.
> 
> The lucene API is very limited as to it's descriptions of it's
> components, short of digging into the code, is there a good doc
> somewhere out there that explains the workins of lucene?
> 
> 
> On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
> <fr...@gmail.com> wrote:
> > So before I spend a significant amount of time digging into the
> lucene
> > code, how does your experience with lucene give light to my
> > situation....  Our current index is pretty huge, and with each
> > increase in side i've had, i've experienced a problem like this...
> > Without taking up too much of your time.. because obviously this i
> my
> > task, I thought i'd ask you if you'd had any experience with this
> > boolean clause nonsense...  of course it can be overcome, but if
> you
> > know a quick hack, awesome, otherwise.. no big, but off to work i
> go
> > :)
> > 
> > -Fraschetti
> > 
> > 
> > ---------- Forwarded message ----------
> > From: Morus Walter <mo...@tanto.de>
> > Date: Mon, 4 Oct 2004 09:01:50 +0200
> > Subject: Re: BooleanQuery - Too Many Clases on date range.
> > To: Lucene Users List <lu...@jakarta.apache.org>, Chris
> > Fraschetti <fr...@gmail.com>
> > 
> > Chris Fraschetti writes:
> > > So i decicded to move my epoch date to the  20040608 date which
> fixed
> > > my boolean query problem in regards to my current data size
> (approx
> > > 600,000) ....
> > >
> > > but now as soon as I do a query like ...      a*
> > > I get the boolean error again. Google obviously can handle this
> query,
> > > and I'm pretty sure lucene can handle it.. any ideas? With out
> > > without a date dange specified i still get the  TooManyClauses
> error.
> > 
> > 
> > > I tired cranking the maxclauses up to Integer.MaxInt, but java
> gave me
> > > a out of memory error. Is this b/c the boolean search tried to
> > > allocate that many clauses by default or because my query
> actually
> > > needed that many clauses?
> > 
> > boolean search allocates clauses for all tokens having the prefix
> or
> > matching the wildcard expression.
> > 
> > > Why does it work on small indexes but not
> > > large?
> > Because there are fewer tokens starting with a.
> > 
> > > Is there any way to have the parser create as many clauses as
> > > it can and then search with what it has? w/o recompiling the
> source?
> > >
> > You need to create your own version of Wildcard- and Prefix-Query
> > that takes a maximum term number and ignores further clauses.
> > And you need a variant of the query parser that uses these queries.
> > 
> > This can be done, even without recompiling lucene, but you will
> have to
> > do some programming at the level of lucene queries.
> > Shouldn't be hard, since you can use the sources as a starting
> point.
> > 
> > I guess this does not exist because the lucene developer decided to
> prefer
> > a query error rather than uncomplete results.
> > 
> > Morus
> > 
> > 
> > --
> > ___________________________________________________
> > Chris Fraschetti, Student CompSci System Admin
> > University of San Francisco
> > e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> > 
> 
> 
> 
> -- 
> ___________________________________________________
> Chris Fraschetti, Student CompSci System Admin
> University of San Francisco
> e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
BTW, what's wrong with the DateFilter solution, I mentionned earlier?

I've used it before (before lucene-1.4 though) without memory problems,
thus I always assumed that it avoided the allocation problems with prefix
queries.

sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:

> Surely some folks out there have used lucene on a large scale and have
> had to compensate for this somehow, any other solutions? Morus, thank
> you very more for your imput, and I am looking into your solution,
> just putting my feelers out there once more.
>
> The lucene API is very limited as to it's descriptions of it's
> components, short of digging into the code, is there a good doc
> somewhere out there that explains the workins of lucene?
>
>
> On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
> <fr...@gmail.com> wrote:
> > So before I spend a significant amount of time digging into the lucene
> > code, how does your experience with lucene give light to my
> > situation....  Our current index is pretty huge, and with each
> > increase in side i've had, i've experienced a problem like this...
> > Without taking up too much of your time.. because obviously this i my
> > task, I thought i'd ask you if you'd had any experience with this
> > boolean clause nonsense...  of course it can be overcome, but if you
> > know a quick hack, awesome, otherwise.. no big, but off to work i go
> > :)
> >
> > -Fraschetti
> >
> >
> > ---------- Forwarded message ----------
> > From: Morus Walter <mo...@tanto.de>
> > Date: Mon, 4 Oct 2004 09:01:50 +0200
> > Subject: Re: BooleanQuery - Too Many Clases on date range.
> > To: Lucene Users List <lu...@jakarta.apache.org>, Chris
> > Fraschetti <fr...@gmail.com>
> >
> > Chris Fraschetti writes:
> > > So i decicded to move my epoch date to the  20040608 date which fixed
> > > my boolean query problem in regards to my current data size (approx
> > > 600,000) ....
> > >
> > > but now as soon as I do a query like ...      a*
> > > I get the boolean error again. Google obviously can handle this query,
> > > and I'm pretty sure lucene can handle it.. any ideas? With out
> > > without a date dange specified i still get the  TooManyClauses error.
> >
> >
> > > I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
> > > a out of memory error. Is this b/c the boolean search tried to
> > > allocate that many clauses by default or because my query actually
> > > needed that many clauses?
> >
> > boolean search allocates clauses for all tokens having the prefix or
> > matching the wildcard expression.
> >
> > > Why does it work on small indexes but not
> > > large?
> > Because there are fewer tokens starting with a.
> >
> > > Is there any way to have the parser create as many clauses as
> > > it can and then search with what it has? w/o recompiling the source?
> > >
> > You need to create your own version of Wildcard- and Prefix-Query
> > that takes a maximum term number and ignores further clauses.
> > And you need a variant of the query parser that uses these queries.
> >
> > This can be done, even without recompiling lucene, but you will have to
> > do some programming at the level of lucene queries.
> > Shouldn't be hard, since you can use the sources as a starting point.
> >
> > I guess this does not exist because the lucene developer decided to prefer
> > a query error rather than uncomplete results.
> >
> > Morus
> >
> >
> > --
> > ___________________________________________________
> > Chris Fraschetti, Student CompSci System Admin
> > University of San Francisco
> > e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> >
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Chris Fraschetti <fr...@gmail.com>.
Surely some folks out there have used lucene on a large scale and have
had to compensate for this somehow, any other solutions? Morus, thank
you very more for your imput, and I am looking into your solution,
just putting my feelers out there once more.

The lucene API is very limited as to it's descriptions of it's
components, short of digging into the code, is there a good doc
somewhere out there that explains the workins of lucene?


On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
<fr...@gmail.com> wrote:
> So before I spend a significant amount of time digging into the lucene
> code, how does your experience with lucene give light to my
> situation....  Our current index is pretty huge, and with each
> increase in side i've had, i've experienced a problem like this...
> Without taking up too much of your time.. because obviously this i my
> task, I thought i'd ask you if you'd had any experience with this
> boolean clause nonsense...  of course it can be overcome, but if you
> know a quick hack, awesome, otherwise.. no big, but off to work i go
> :)
> 
> -Fraschetti
> 
> 
> ---------- Forwarded message ----------
> From: Morus Walter <mo...@tanto.de>
> Date: Mon, 4 Oct 2004 09:01:50 +0200
> Subject: Re: BooleanQuery - Too Many Clases on date range.
> To: Lucene Users List <lu...@jakarta.apache.org>, Chris
> Fraschetti <fr...@gmail.com>
> 
> Chris Fraschetti writes:
> > So i decicded to move my epoch date to the  20040608 date which fixed
> > my boolean query problem in regards to my current data size (approx
> > 600,000) ....
> >
> > but now as soon as I do a query like ...      a*
> > I get the boolean error again. Google obviously can handle this query,
> > and I'm pretty sure lucene can handle it.. any ideas? With out
> > without a date dange specified i still get the  TooManyClauses error.
> 
> 
> > I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
> > a out of memory error. Is this b/c the boolean search tried to
> > allocate that many clauses by default or because my query actually
> > needed that many clauses?
> 
> boolean search allocates clauses for all tokens having the prefix or
> matching the wildcard expression.
> 
> > Why does it work on small indexes but not
> > large?
> Because there are fewer tokens starting with a.
> 
> > Is there any way to have the parser create as many clauses as
> > it can and then search with what it has? w/o recompiling the source?
> >
> You need to create your own version of Wildcard- and Prefix-Query
> that takes a maximum term number and ignores further clauses.
> And you need a variant of the query parser that uses these queries.
> 
> This can be done, even without recompiling lucene, but you will have to
> do some programming at the level of lucene queries.
> Shouldn't be hard, since you can use the sources as a starting point.
> 
> I guess this does not exist because the lucene developer decided to prefer
> a query error rather than uncomplete results.
> 
> Morus
> 
> 
> --
> ___________________________________________________
> Chris Fraschetti, Student CompSci System Admin
> University of San Francisco
> e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> 



-- 
___________________________________________________
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e fraschetti@gmail.com | http://meteora.cs.usfca.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Morus Walter <mo...@tanto.de>.
Chris Fraschetti writes:
> So i decicded to move my epoch date to the  20040608 date which fixed
> my boolean query problem in regards to my current data size (approx
> 600,000) ....
> 
> but now as soon as I do a query like ...      a*
> I get the boolean error again. Google obviously can handle this query,
> and I'm pretty sure jguru.com can handle it too.. any ideas? With out
> without a date dange specified i still get teh  TooManyClauses error. 
> I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
> a out of memory error. Is this b/c the boolean search tried to
> allocate that many clauses by default or because my query actually
> needed that many clauses?  

boolean search allocates clauses for all tokens having the prefix or
matching the wildcard expression.

> Why does it work on small indexes but not
> large? 
Because there are fewer tokens starting with a.

> Is there any way to have the parser create as many clauses as
> it can and then search with what it has? w/o recompiling the source?
> 
You need to create your own version of Wildcard- and Prefix-Query
that takes a maximum term number and ignores further clauses.
And you need a variant of the query parser that uses these queries.

This can be done, even without recompiling lucene, but you will have to
do some programming at the level of lucene queries.
Shouldn't be hard, since you can use the sources as a starting point.

I guess this does not exist because the lucene developer decided to prefer
a query error rather than uncomplete results.

Morus


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Chris Fraschetti <fr...@gmail.com>.
So i decicded to move my epoch date to the  20040608 date which fixed
my boolean query problem in regards to my current data size (approx
600,000) ....

but now as soon as I do a query like ...      a*
I get the boolean error again. Google obviously can handle this query,
and I'm pretty sure jguru.com can handle it too.. any ideas? With out
without a date dange specified i still get teh  TooManyClauses error. 
I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
a out of memory error. Is this b/c the boolean search tried to
allocate that many clauses by default or because my query actually
needed that many clauses?  Why does it work on small indexes but not
large? Is there any way to have the parser create as many clauses as
it can and then search with what it has? w/o recompiling the source?

Thanks!


On Fri, 01 Oct 2004 15:48:36 +0200, Damian Gajda <dg...@caltha.pl> wrote:
> Dnia 01-10-2004, pi± o godzinie 07:57 -0500, Scott Ganyo napisa³(a):
> > You can use:
> >
> > BooleanQuery.setMaxClauseCount(int maxClauseCount);
> 
> I had a similar problem with date ranges. Someone on the list suggested
> me a solution to my problems but it was more clever than the above
> solution, which helps but makes the searches work slower and is memory
> hungry (many terms are loaded into memmory, and than searched).
> 
> The solution suggested was to split dates into sub fields during
> indexing and use those fields while searching. This makes it more
> effective but harder to create a query (personally I prefer working on
> queries build using Lucene API, than ones parsed by QueryParser).
> 
> For instance the time stamp 2004-10-01 15:34:26.001 may be split into
> following fields:
> <some-date>_year: 2004
> <some-date>_month: 10
> <some-date>_day: 01
> <some-date>_time: 153426001
> 
> The above fields should be indexed so they can be searched. They give
> some nice possibilities, for instance fast and easy querying for all
> documents that have a date in a particular year, month or day of month.
> For conveniece one could also store weekdays.
> 
> A query for a date range from 15th august to 10th october 2004 (in no
> particular query language - this just gives an idea):
> <some-date>_year = 2004 AND (
>   (<some-date>_month = 08 AND <some-date>_day >= 15) OR
>   (<some-date>_month=09) OR
>   (<some-date>_month = 10 AND <some-date>_day <= 10)
> )
> 
> As You can see it is easy to build such a query from the lucene API. The
> equalities are Term queries. The inequalities are Range queries. The AND
> and OR operators can be provided by usage of Boolean queries.
> 
> Have fun implementing the solution - it has only one disadvantage. It
> makes results sorting not so easy. The solution for it is usage of
> multiple sort fields, or another stored field containing a full date
> (one almost surely will need to store a date for each hit, unless You
> want to write some baroque code to calculate date from split fields
> values).
> 
> Have fun,
> --
> Damian Gajda
> Caltha Sp. j.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 



-- 
___________________________________________________
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e fraschetti@gmail.com | http://meteora.cs.usfca.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Damian Gajda <dg...@caltha.pl>.
Dnia 01-10-2004, pią o godzinie 07:57 -0500, Scott Ganyo napisał(a):
> You can use:
> 
> BooleanQuery.setMaxClauseCount(int maxClauseCount);

I had a similar problem with date ranges. Someone on the list suggested
me a solution to my problems but it was more clever than the above
solution, which helps but makes the searches work slower and is memory
hungry (many terms are loaded into memmory, and than searched).

The solution suggested was to split dates into sub fields during
indexing and use those fields while searching. This makes it more
effective but harder to create a query (personally I prefer working on
queries build using Lucene API, than ones parsed by QueryParser).

For instance the time stamp 2004-10-01 15:34:26.001 may be split into
following fields:
<some-date>_year: 2004
<some-date>_month: 10
<some-date>_day: 01
<some-date>_time: 153426001

The above fields should be indexed so they can be searched. They give
some nice possibilities, for instance fast and easy querying for all
documents that have a date in a particular year, month or day of month.
For conveniece one could also store weekdays.

A query for a date range from 15th august to 10th october 2004 (in no
particular query language - this just gives an idea):
<some-date>_year = 2004 AND (
   (<some-date>_month = 08 AND <some-date>_day >= 15) OR
   (<some-date>_month=09) OR
   (<some-date>_month = 10 AND <some-date>_day <= 10)
)

As You can see it is easy to build such a query from the lucene API. The
equalities are Term queries. The inequalities are Range queries. The AND
and OR operators can be provided by usage of Boolean queries.

Have fun implementing the solution - it has only one disadvantage. It
makes results sorting not so easy. The solution for it is usage of
multiple sort fields, or another stored field containing a full date
(one almost surely will need to store a date for each hit, unless You
want to write some baroque code to calculate date from split fields
values).

Have fun,
-- 
Damian Gajda
Caltha Sp. j.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Scott Ganyo <sc...@ganyo.com>.
You can use:

BooleanQuery.setMaxClauseCount(int maxClauseCount);

to increase the limit.

On Sep 30, 2004, at 8:24 PM, Chris Fraschetti wrote:

> I recently read in regards to my problem that date_field:[0820483200
> TO 1104480000]
> is evluated into a series of boolean queries ... which has a cap of
> 1024 ... considering my documents will have dates spanning over many
> years, and i need the granualirity of 'by day' searching, are there
> any reccomendations on how to make this work?
>
> Currently with query: +content_field:sometext +date_field:[0820483200
> TO 1104480000]
> I get the following exception:
> org.apache.lucene.search.BooleanQuery$TooManyClauses
>
>
> any suggestions on how I can still keep the granuality of by day, but
> without limiting my search results? Are there any date formats that I
> can change those numbers to that would allow me to complete the search
> (i.e.  Feb, 15 2004 ) .. can lucene's range do a proper search on
> formatted dates?
>
> Is there a combination of RangeQuery and Query/MultiTermQuery that I 
> can use?
>
> your help is greatly appreciated.
>
>
> -- 
> ___________________________________________________
> Chris Fraschetti
> e fraschetti@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: BooleanQuery - Too Many Clases on date range.

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
How about a DateFilter?

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DateFilter.html

I don't believe it's got the same restrictions as boolean queries.

HTH,
sv

On Thu, 30 Sep 2004, Chris Fraschetti wrote:

> I recently read in regards to my problem that date_field:[0820483200
> TO 1104480000]
> is evluated into a series of boolean queries ... which has a cap of
> 1024 ... considering my documents will have dates spanning over many
> years, and i need the granualirity of 'by day' searching, are there
> any reccomendations on how to make this work?
>
> Currently with query: +content_field:sometext +date_field:[0820483200
> TO 1104480000]
> I get the following exception:
> org.apache.lucene.search.BooleanQuery$TooManyClauses
>
>
> any suggestions on how I can still keep the granuality of by day, but
> without limiting my search results? Are there any date formats that I
> can change those numbers to that would allow me to complete the search
> (i.e.  Feb, 15 2004 ) .. can lucene's range do a proper search on
> formatted dates?
>
> Is there a combination of RangeQuery and Query/MultiTermQuery that I can use?
>
> your help is greatly appreciated.
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org