You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Lu <ch...@gmail.com> on 2005/07/01 05:27:49 UTC

Re: Design question [too many fields?]

Mark, your suggestion will incur another trip to the database. And if 
the search results is large, filtering in DB by pk is not really good.

Erik, your original "date" field is good when there is not many 
dates(<1024) in the database. Otherwise, Range Query can not handle it.

My suggestion is, use "year" + "month" + "day" three fields to store 
date. And when searching, for example, any date that's greater than 
2005-06-30, you can use this query to search: ( year > 2005 ) or  ( 
year=2005 and month>=6) or ( year=2005 and month=6 and day > 30 ).
It's a combination of BooleanQuery, TermQuery, and RangeQuery.

This may seem cumbersome, but it can save one trip to database, and 
circumvent Lucene's limitation.

Chris Lu
http://www.dbsight.net

Erik Hatcher wrote:

> I second Mark's suggestion over the alternative I posted.  My  
> alternative was merely to invert the field structure originally  
> described, but using a Filter for the volatile information is wiser.
>
>     Erik
>
> On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
>
>> Presumably there is also a free-text element to the
>> search or you wouldn't be using Lucene.
>>
>> Multiple fields is not the way to go.
>> A single Lucene field could contain multiple terms (
>> the available dates) but I still don't think that's
>> the best solution.
>> The availability info is likely to be pretty volatile
>> and you always want up-to-date info so I would prefer
>> to hit a database for this. If you keep a DB primary
>> key to Lucene doc id look-up cached in memory you can
>> quickly construct a Lucene filter from the database
>> results and therefore only show Lucene results for
>> available rooms.
>>
>> Cheers
>> Mark
>>
>>
>>
>> ___________________________________________________________
>> How much free photo storage do you get? Store your holiday
>> snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Vedr. Re: Design question [too many fields?]

Posted by Chris Lu <ch...@gmail.com>.
> It is anyway going to be too many fields then? Days of
> year for the whole year ahead? Since the fromDate and
> toDate can be across two months and the customer wants
> the data be available for one year.

It won't have too many fields.
> > My suggestion is, use "year" + "month" + "day" three
> > fields to store
"day" field means days for one month. So "month" and "day" two fields
will have 12 and 31 values respectively.
And "year" field depends on what data you got. I guess your data won't
span accross 1024 years.

-- 
Chris Lu
---------------------
Full-Text Search on Any Database
http://www.dbsight.net

On 6/30/05, Naimdjon Takhirov <tn...@yahoo.com> wrote:
> Hi Chris,
> 
> It is anyway going to be too many fields then? Days of
> year for the whole year ahead? Since the fromDate and
> toDate can be across two months and the customer wants
> the data be available for one year.
> 
> Naimdjon
> 
> --- Chris Lu <ch...@gmail.com> skrev:
> 
> > Mark, your suggestion will incur another trip to the
> > database. And if
> > the search results is large, filtering in DB by pk
> > is not really good.
> >
> > Erik, your original "date" field is good when there
> > is not many
> > dates(<1024) in the database. Otherwise, Range Query
> > can not handle it.
> >
> > My suggestion is, use "year" + "month" + "day" three
> > fields to store
> > date. And when searching, for example, any date
> > that's greater than
> > 2005-06-30, you can use this query to search: ( year
> > > 2005 ) or  (
> > year=2005 and month>=6) or ( year=2005 and month=6
> > and day > 30 ).
> > It's a combination of BooleanQuery, TermQuery, and
> > RangeQuery.
> >
> > This may seem cumbersome, but it can save one trip
> > to database, and
> > circumvent Lucene's limitation.
> >
> > Chris Lu
> > http://www.dbsight.net
> >
> > Erik Hatcher wrote:
> >
> > > I second Mark's suggestion over the alternative I
> > posted.  My
> > > alternative was merely to invert the field
> > structure originally
> > > described, but using a Filter for the volatile
> > information is wiser.
> > >
> > >     Erik
> > >
> > > On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
> > >
> > >> Presumably there is also a free-text element to
> > the
> > >> search or you wouldn't be using Lucene.
> > >>
> > >> Multiple fields is not the way to go.
> > >> A single Lucene field could contain multiple
> > terms (
> > >> the available dates) but I still don't think
> > that's
> > >> the best solution.
> > >> The availability info is likely to be pretty
> > volatile
> > >> and you always want up-to-date info so I would
> > prefer
> > >> to hit a database for this. If you keep a DB
> > primary
> > >> key to Lucene doc id look-up cached in memory you
> > can
> > >> quickly construct a Lucene filter from the
> > database
> > >> results and therefore only show Lucene results
> > for
> > >> available rooms.
> > >>
> > >> Cheers
> > >> Mark
> > >>
> > >>
> > >>
> > >>
> >
> ___________________________________________________________
> > >> How much free photo storage do you get? Store
> > your holiday
> > >> snaps for FREE with Yahoo! Photos
> > http://uk.photos.yahoo.com
> > >>
> > >>
> >
> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >>
> > >
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.
Quoting Paul Elschot <pa...@xs4all.nl>:

> Dave,
>
> On Tuesday 05 July 2005 20:54, Paul Elschot wrote:
> > On Tuesday 05 July 2005 14:35, Dave Kor wrote:
> ...
> > >
> > > Hopefully, this explains what I am trying to achieve with Lucene and why
> I need
> > > to match repeated sub-queries. I would really appreciate it if anyone has
> a
> > > solution, a quickfix or can guide me in hacking up something workable.
> >
> > So, in an ordered SpanNearQuery, you want repeated subqueries not to match
> the
> > same text/tokens, which boils down to non overlapping matches.
> >
> > I had another look at NearSpans.java, and I'm afraid there is no quick fix
> for this
> > (but I'd like to be be proven wrong).
> > Spans can match ordered/unordered and overlapping/nonoverlapping.
> > Currently for the overlap there is no parameter, and I don't know how
> > SpanNearQuery behaves wrt. to overlapping matches.
> > There is no special case for equal subqueries, which is probably ok, but
> > when overlaps are allowed care should be taken not to use equal subqueries.
> >
> > On hacking up something workable: it would be good to get this
> > bug out of NearSpans.
>
> This might be a fix, it reduces the number of cases that are considered
> ordered
> matches. It also passes all unit tests here:
>
>   private boolean matchIsOrdered() {
>     SpansCell spansCell = (SpansCell) ordered.get(0);
>     int lastStart = spansCell.start(); // no need to compare doc nrs here.
>     int lastEnd = spansCell.end();
>     for (int i = 1; i < ordered.size(); i++) {
>       spansCell = (SpansCell) ordered.get(i);
>       int start = spansCell.start();
>       int end = spansCell.end();
>       if ((start < lastStart) || ((start == lastStart) && (end <= lastEnd)))
> {
>         return false; // also equal begin and end is not ordered.
>       }
>       lastStart = start;
>       lastEnd = end;
>     }
>     return true;
>   }
>
> Could you replace the matchIsOrdered() method with the above one
> and see whether you can still reproduce the "Unexptected: ordered"
> exception?
>
> There is some interplay between the matchIsOrdered() method and
> the lessThan() method in CellQueue that also uses  the SpansCell index,
> and I hope this gets it right.

Yup, your code has eliminated all the exceptions. But so far I have not had time
to look in detail to see if it works correctly, (my deadline is this wednesday)
so I am just assuming it works. I'll get back to you next week if everything
checks out correctly.


Regards,
Dave.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.
Dave, 

On Tuesday 05 July 2005 20:54, Paul Elschot wrote:
> On Tuesday 05 July 2005 14:35, Dave Kor wrote:
...
> > 
> > Hopefully, this explains what I am trying to achieve with Lucene and why I need
> > to match repeated sub-queries. I would really appreciate it if anyone has a
> > solution, a quickfix or can guide me in hacking up something workable.
> 
> So, in an ordered SpanNearQuery, you want repeated subqueries not to match the
> same text/tokens, which boils down to non overlapping matches.
> 
> I had another look at NearSpans.java, and I'm afraid there is no quick fix for this
> (but I'd like to be be proven wrong).
> Spans can match ordered/unordered and overlapping/nonoverlapping.
> Currently for the overlap there is no parameter, and I don't know how
> SpanNearQuery behaves wrt. to overlapping matches.
> There is no special case for equal subqueries, which is probably ok, but
> when overlaps are allowed care should be taken not to use equal subqueries.
> 
> On hacking up something workable: it would be good to get this
> bug out of NearSpans.

This might be a fix, it reduces the number of cases that are considered ordered
matches. It also passes all unit tests here:

  private boolean matchIsOrdered() {
    SpansCell spansCell = (SpansCell) ordered.get(0); 
    int lastStart = spansCell.start(); // no need to compare doc nrs here.
    int lastEnd = spansCell.end();
    for (int i = 1; i < ordered.size(); i++) {
      spansCell = (SpansCell) ordered.get(i);
      int start = spansCell.start();
      int end = spansCell.end();
      if ((start < lastStart) || ((start == lastStart) && (end <= lastEnd))) {
        return false; // also equal begin and end is not ordered.
      }
      lastStart = start;
      lastEnd = end;
    }
    return true;
  }

Could you replace the matchIsOrdered() method with the above one
and see whether you can still reproduce the "Unexptected: ordered"
exception?

There is some interplay between the matchIsOrdered() method and
the lessThan() method in CellQueue that also uses  the SpansCell index,
and I hope this gets it right.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.
On Tuesday 05 July 2005 14:35, Dave Kor wrote:
> Quoting Paul Elschot <pa...@xs4all.nl>:
> 
> > On Monday 04 July 2005 22:51, Dave Kor wrote:
> > > > I had another look at the code, and my guess now is that this is
> > > > related to the spanNear with the single argument.
> > > >
> > > > It rings some bells. One of them is that I would have preferred
> > > > to split the SpanNear class into ordered/unordered after the fix,
> > > > but that I gave up because it would take too much time.
> > > > The current SpanNear class is too complex for easy maintenance.
> > > >
> > > > Perhaps the quick fix is to verify in the constructor of SpanNearQuery
> > > > that the number of clauses is at least 2, and to throw an illegal arg
> > > > exception otherwise.
> > >
> > > Alright, I'll add code to ensure that I do not generate SpanNearQueries
> > that
> > > contain only a single sub-query and see what happens, I hope this solves my
> > > problem!
> > >
> > > Earlier, I went back to have a more in-depth look at the queries that were
> > > throwing these exceptions. My system, an experimental query expansion
> > module,
> > > had generated over 900+ queries and out of those, 50-60 queries cause the
> > RTE.
> > >
> > > From these queries, I can find many repeated multi-term SpanNearQueries
> > that
> > > also throws the same RTE. Here are some examples where the bracket shows
> > how
> > > the terms are grouped in a SpanNearQuery:
> > >
> > > ((the (regent hotel)) (the (regent hotel) to))
> > > (((elton john)) ((elton john) and))
> > > (((the who) is) ((the who) of))
> > > ((is) (the (the band nirvana) band))
> > > (((united states)) (united states president is the))
> > > (((academy awards) of) ((academy awards) is))
> >
> > In all these cases overlap between two matches can occur because they have
> > an equal subquery. The conclusion is that the current span code is not
> > capable
> > of handling such cases. It probably chokes at the moment the matches for
> > such subqueries concur.
> 
> I'm not quite sure what you mean here by "an equal subquery". I am not trying to
> get two subqueries to match the same portion of a document. Instead, I am
> looking for a repeat of the same search term(s) somewhere farther in the
> document.

I meant for example
(elton john)
occurring twice above.
 
> > The question is whether you would consider such a concurrence to be a match
> > for the query.
> > If so, the fix might be to return true instead of throwing the exception.
> 
> I have simplified the above examples by substituting the original search terms
> with more intelligible terms, which unfortunately made the above queries seem
> pointless. In reality, my system is trying to search for sentences that conform
> to certain linguistic structures.
> 
> An example of a useful search is a comma followed by another comma several words
> later, followed by the phrase "academy award winner". In other words
> 
> (, (, (academy award winner)~2)~3)~8
> 
> This search would pick up only sentences like "Dafoe , who played the role of
> Jesus in The Last Temptation of Christ , is also an Academy Award winner for
> his ... "
> 
> Hopefully, this explains what I am trying to achieve with Lucene and why I need
> to match repeated sub-queries. I would really appreciate it if anyone has a
> solution, a quickfix or can guide me in hacking up something workable.

So, in an ordered SpanNearQuery, you want repeated subqueries not to match the
same text/tokens, which boils down to non overlapping matches.

I had another look at NearSpans.java, and I'm afraid there is no quick fix for this
(but I'd like to be be proven wrong).
Spans can match ordered/unordered and overlapping/nonoverlapping.
Currently for the overlap there is no parameter, and I don't know how
SpanNearQuery behaves wrt. to overlapping matches.
There is no special case for equal subqueries, which is probably ok, but
when overlaps are allowed care should be taken not to use equal subqueries.

On hacking up something workable: it would be good to get this
bug out of NearSpans.

Anyway, to test this, eg. using the examples you gave above,
TestSpans.java here has some small code examples to start from:
http://svn.apache.org/viewcvs.cgi/lucene/java/tags/lucene_1_4_3/src/test/org/apache/lucene/search/spans/

TestBasics.java there has some larger examples.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.
Quoting Paul Elschot <pa...@xs4all.nl>:

> On Monday 04 July 2005 22:51, Dave Kor wrote:
> > > I had another look at the code, and my guess now is that this is
> > > related to the spanNear with the single argument.
> > >
> > > It rings some bells. One of them is that I would have preferred
> > > to split the SpanNear class into ordered/unordered after the fix,
> > > but that I gave up because it would take too much time.
> > > The current SpanNear class is too complex for easy maintenance.
> > >
> > > Perhaps the quick fix is to verify in the constructor of SpanNearQuery
> > > that the number of clauses is at least 2, and to throw an illegal arg
> > > exception otherwise.
> >
> > Alright, I'll add code to ensure that I do not generate SpanNearQueries
> that
> > contain only a single sub-query and see what happens, I hope this solves my
> > problem!
> >
> > Earlier, I went back to have a more in-depth look at the queries that were
> > throwing these exceptions. My system, an experimental query expansion
> module,
> > had generated over 900+ queries and out of those, 50-60 queries cause the
> RTE.
> >
> > From these queries, I can find many repeated multi-term SpanNearQueries
> that
> > also throws the same RTE. Here are some examples where the bracket shows
> how
> > the terms are grouped in a SpanNearQuery:
> >
> > ((the (regent hotel)) (the (regent hotel) to))
> > (((elton john)) ((elton john) and))
> > (((the who) is) ((the who) of))
> > ((is) (the (the band nirvana) band))
> > (((united states)) (united states president is the))
> > (((academy awards) of) ((academy awards) is))
>
> In all these cases overlap between two matches can occur because they have
> an equal subquery. The conclusion is that the current span code is not
> capable
> of handling such cases. It probably chokes at the moment the matches for
> such subqueries concur.

I'm not quite sure what you mean here by "an equal subquery". I am not trying to
get two subqueries to match the same portion of a document. Instead, I am
looking for a repeat of the same search term(s) somewhere farther in the
document.

> The question is whether you would consider such a concurrence to be a match
> for the query.
> If so, the fix might be to return true instead of throwing the exception.

I have simplified the above examples by substituting the original search terms
with more intelligible terms, which unfortunately made the above queries seem
pointless. In reality, my system is trying to search for sentences that conform
to certain linguistic structures.

An example of a useful search is a comma followed by another comma several words
later, followed by the phrase "academy award winner". In other words

(, (, (academy award winner)~2)~3)~8

This search would pick up only sentences like "Dafoe , who played the role of
Jesus in The Last Temptation of Christ , is also an Academy Award winner for
his ... "

Hopefully, this explains what I am trying to achieve with Lucene and why I need
to match repeated sub-queries. I would really appreciate it if anyone has a
solution, a quickfix or can guide me in hacking up something workable.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.
On Monday 04 July 2005 22:51, Dave Kor wrote:
> *chuckles* It seems I can post to this list without subscribing to it. :)
> 
> > I had another look at the code, and my guess now is that this is
> > related to the spanNear with the single argument.
> >
> > It rings some bells. One of them is that I would have preferred
> > to split the SpanNear class into ordered/unordered after the fix,
> > but that I gave up because it would take too much time.
> > The current SpanNear class is too complex for easy maintenance.
> >
> > Perhaps the quick fix is to verify in the constructor of SpanNearQuery
> > that the number of clauses is at least 2, and to throw an illegal arg
> > exception otherwise.
> 
> Alright, I'll add code to ensure that I do not generate SpanNearQueries that
> contain only a single sub-query and see what happens, I hope this solves my
> problem!
> 
> Earlier, I went back to have a more in-depth look at the queries that were
> throwing these exceptions. My system, an experimental query expansion 
module,
> had generated over 900+ queries and out of those, 50-60 queries cause the 
RTE.
> 
> From these queries, I can find many repeated multi-term SpanNearQueries that
> also throws the same RTE. Here are some examples where the bracket shows how
> the terms are grouped in a SpanNearQuery:
> 
> ((the (regent hotel)) (the (regent hotel) to))
> (((elton john)) ((elton john) and))
> (((the who) is) ((the who) of))
> ((is) (the (the band nirvana) band))
> (((united states)) (united states president is the))
> (((academy awards) of) ((academy awards) is))

In all these cases overlap between two matches can occur because they have
an equal subquery. The conclusion is that the current span code is not capable
of handling such cases. It probably chokes at the moment the matches for
such subqueries concur.

The question is whether you would consider such a concurrence to be a match
for the query.
If so, the fix might be to return true instead of throwing the exception.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.
On Sunday 03 July 2005 17:42, Dave Kor wrote:
> Quoting Paul Elschot <pa...@xs4all.nl>:
> 
> > On Sunday 03 July 2005 15:27, Dave Kor wrote:
> > > I have a system that automatically generate span queries to Lucene.
> > Sometimes,
> > > the system generates a query like this one which always throws a
> > > RuntimeException:
> > >
> > > spanNear([spanNear([text:interesting], 3, true),
> > spanNear([text:interesting,
> > > text:john, text:said], 8, true)], 2, true)
> > >
> > > Basically, the system is looking for a document that contains a string
> > sequence
> > > "interesting .... interesting john said". The thrown exception is as
> > follows:
> > >
> > > java.lang.RuntimeException: Unexpected: ordered
> > >         at
> > >
> >
> 
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
> > > Source)
> > >         at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
> > >         at org.apache.lucene.search.spans.SpanScorer.next(Unknown 
Source)
> > >         at org.apache.lucene.search.Scorer.score(Unknown Source)
> > >         at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> > >
> > > My question is, what does is this "Unexpected: ordered" mean? and is 
there
> > > anyway I can avoid these exceptions?
> >
> > It's an internal error that is not supposed to occur.
> > Could you continue on the java-dev list?
> >
> > SpanNearQuery is not supposed to operate on a single argument, at least
> > that's what I thought when I wrote the bug fix code that throws this
> > exception. Does the exception go away when you replace the first spanNear
> > (the one with the single [text:interesting]  with a SpanTermQuery ?

See below.

> >
> > It's also possible that the code cannot handle the two identical
> > text:interesting arguments.
> >
> > It's probably good to have a test case for this. Could you extend the
> > exception with the document number and maybe a position within the
> > document to try and get to the original text that causes this exception,
> > and use that to file a bug report?
> 
> I'll see what I can do about the test case. From what I can tell thus far, 
this
> exception is thrown when CellQueue is empty in the function
> NearSpan.firstNonOrderedNextToPartialList(). I hope it rings a bell 
somewhere.

I had another look at the code, and my guess now is that this is related to
the spanNear with the single argument. So I'd like to know what
happens when this is replaced by a SpanTermQuery.

It does ring some bells. One of them is that I would have prefered to split
the NearSpans class into two after the bug fix, one for the ordered case,
and one for the non ordered case. I did not make that split then because it
worked for the test cases, and I did not want to spend more time on it.
Anyway, the current NearSpans code is too complex for easy maintenance.

Perhaps the quick fix is to make sure that the SpanNearQuery passed to
the NearSpans has at least two clauses, and that the SpanNearQuery constructor
throws an IllegalArgumentException otherwise.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.
On Sunday 03 July 2005 23:23, Paul Elschot wrote:

Please forget about the last message, I thought I had lost the earlier one.

Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.
On Sunday 03 July 2005 17:42, Dave Kor wrote:
> Quoting Paul Elschot <pa...@xs4all.nl>:
> 
> > On Sunday 03 July 2005 15:27, Dave Kor wrote:
> > > I have a system that automatically generate span queries to Lucene.
> > Sometimes,
> > > the system generates a query like this one which always throws a
> > > RuntimeException:
> > >
> > > spanNear([spanNear([text:interesting], 3, true),
> > spanNear([text:interesting,
> > > text:john, text:said], 8, true)], 2, true)
> > >
> > > Basically, the system is looking for a document that contains a string
> > sequence
> > > "interesting .... interesting john said". The thrown exception is as
> > follows:
> > >
> > > java.lang.RuntimeException: Unexpected: ordered
> > >         at
> > >
> >
> 
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
> > > Source)
> > >         at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
> > >         at org.apache.lucene.search.spans.SpanScorer.next(Unknown 
Source)
> > >         at org.apache.lucene.search.Scorer.score(Unknown Source)
> > >         at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> > >
> > > My question is, what does is this "Unexpected: ordered" mean? and is 
there
> > > anyway I can avoid these exceptions?
> >
> > It's an internal error that is not supposed to occur.
> > Could you continue on the java-dev list?
> >
> > SpanNearQuery is not supposed to operate on a single argument, at least
> > that's what I thought when I wrote the bug fix code that throws this
> > exception. Does the exception go away when you replace the first spanNear
> > (the one with the single [text:interesting]  with a SpanTermQuery ?
> >
> > It's also possible that the code cannot handle the two identical
> > text:interesting arguments.
> >
> > It's probably good to have a test case for this. Could you extend the
> > exception with the document number and maybe a position within the
> > document to try and get to the original text that causes this exception,
> > and use that to file a bug report?
> 
> I'll see what I can do about the test case. From what I can tell thus far, 
this
> exception is thrown when CellQueue is empty in the function
> NearSpan.firstNonOrderedNextToPartialList(). I hope it rings a bell 
somewhere.

I had another look at the code, and my guess now is that this is
related to the spanNear with the single argument.

It rings some bells. One of them is that I would have preferred
to split the SpanNear class into ordered/unordered after the fix,
but that I gave up because it would take too much time.
The current SpanNear class is too complex for easy maintenance.

Perhaps the quick fix is to verify in the constructor of SpanNearQuery
that the number of clauses is at least 2, and to throw an illegal arg
exception otherwise.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 4, 2005, at 4:51 PM, Dave Kor wrote:
> *chuckles* It seems I can post to this list without subscribing to  
> it. :)

I moderate in messages that are on topic but from unsubscribed  
addresses quite often.  Perhaps this was the case?

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.
*chuckles* It seems I can post to this list without subscribing to it. :)

> I had another look at the code, and my guess now is that this is
> related to the spanNear with the single argument.
>
> It rings some bells. One of them is that I would have preferred
> to split the SpanNear class into ordered/unordered after the fix,
> but that I gave up because it would take too much time.
> The current SpanNear class is too complex for easy maintenance.
>
> Perhaps the quick fix is to verify in the constructor of SpanNearQuery
> that the number of clauses is at least 2, and to throw an illegal arg
> exception otherwise.

Alright, I'll add code to ensure that I do not generate SpanNearQueries that
contain only a single sub-query and see what happens, I hope this solves my
problem!

Earlier, I went back to have a more in-depth look at the queries that were
throwing these exceptions. My system, an experimental query expansion module,
had generated over 900+ queries and out of those, 50-60 queries cause the RTE.

>From these queries, I can find many repeated multi-term SpanNearQueries that
also throws the same RTE. Here are some examples where the bracket shows how
the terms are grouped in a SpanNearQuery:

((the (regent hotel)) (the (regent hotel) to))
(((elton john)) ((elton john) and))
(((the who) is) ((the who) of))
((is) (the (the band nirvana) band))
(((united states)) (united states president is the))
(((academy awards) of) ((academy awards) is))

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.
Quoting Paul Elschot <pa...@xs4all.nl>:

> On Sunday 03 July 2005 15:27, Dave Kor wrote:
> > I have a system that automatically generate span queries to Lucene.
> Sometimes,
> > the system generates a query like this one which always throws a
> > RuntimeException:
> >
> > spanNear([spanNear([text:interesting], 3, true),
> spanNear([text:interesting,
> > text:john, text:said], 8, true)], 2, true)
> >
> > Basically, the system is looking for a document that contains a string
> sequence
> > "interesting .... interesting john said". The thrown exception is as
> follows:
> >
> > java.lang.RuntimeException: Unexpected: ordered
> >         at
> >
>
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
> > Source)
> >         at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
> >         at org.apache.lucene.search.spans.SpanScorer.next(Unknown Source)
> >         at org.apache.lucene.search.Scorer.score(Unknown Source)
> >         at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> >
> > My question is, what does is this "Unexpected: ordered" mean? and is there
> > anyway I can avoid these exceptions?
>
> It's an internal error that is not supposed to occur.
> Could you continue on the java-dev list?
>
> SpanNearQuery is not supposed to operate on a single argument, at least
> that's what I thought when I wrote the bug fix code that throws this
> exception. Does the exception go away when you replace the first spanNear
> (the one with the single [text:interesting]  with a SpanTermQuery ?
>
> It's also possible that the code cannot handle the two identical
> text:interesting arguments.
>
> It's probably good to have a test case for this. Could you extend the
> exception with the document number and maybe a position within the
> document to try and get to the original text that causes this exception,
> and use that to file a bug report?

I'll see what I can do about the test case. From what I can tell thus far, this
exception is thrown when CellQueue is empty in the function
NearSpan.firstNonOrderedNextToPartialList(). I hope it rings a bell somewhere.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.
On Sunday 03 July 2005 15:27, Dave Kor wrote:
> I have a system that automatically generate span queries to Lucene. 
Sometimes,
> the system generates a query like this one which always throws a
> RuntimeException:
> 
> spanNear([spanNear([text:interesting], 3, true), spanNear([text:interesting,
> text:john, text:said], 8, true)], 2, true)
> 
> Basically, the system is looking for a document that contains a string 
sequence
> "interesting .... interesting john said". The thrown exception is as 
follows:
> 
> java.lang.RuntimeException: Unexpected: ordered
>         at
> 
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
> Source)
>         at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
>         at org.apache.lucene.search.spans.SpanScorer.next(Unknown Source)
>         at org.apache.lucene.search.Scorer.score(Unknown Source)
>         at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> 
> My question is, what does is this "Unexpected: ordered" mean? and is there
> anyway I can avoid these exceptions?

It's an internal error that is not supposed to occur.
Could you continue on the java-dev list?

SpanNearQuery is not supposed to operate on a single argument, at least
that's what I thought when I wrote the bug fix code that throws this
exception. Does the exception go away when you replace the first spanNear
(the one with the single [text:interesting]  with a SpanTermQuery ?

It's also possible that the code cannot handle the two identical
text:interesting arguments. 

It's probably good to have a test case for this. Could you extend the
exception with the document number and maybe a position within the
document to try and get to the original text that causes this exception,
and use that to file a bug report?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.
I have a system that automatically generate span queries to Lucene. Sometimes,
the system generates a query like this one which always throws a
RuntimeException:

spanNear([spanNear([text:interesting], 3, true), spanNear([text:interesting,
text:john, text:said], 8, true)], 2, true)

Basically, the system is looking for a document that contains a string sequence
"interesting .... interesting john said". The thrown exception is as follows:

java.lang.RuntimeException: Unexpected: ordered
        at
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
Source)
        at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
        at org.apache.lucene.search.spans.SpanScorer.next(Unknown Source)
        at org.apache.lucene.search.Scorer.score(Unknown Source)
        at org.apache.lucene.search.IndexSearcher.search(Unknown Source)

My question is, what does is this "Unexpected: ordered" mean? and is there
anyway I can avoid these exceptions?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Design question [too many fields?]

Posted by Chris Hostetter <ho...@fucit.org>.
: My head was thinking to find a generic solution to Lucene's
: limitation: The TooManyClauses problem when using RangeQuery and there
: are more than 1024 values. It should be another thread.

It's been discussed in several threads, and i can think of 2 good
solutions at this point...

Using a RangeFilter (or DateFilter) instead of a RangeQuery
http://nagoya.apache.org/eyebrowse/BrowseList?listName=lucene-user@jakarta.apache.org&by=thread&from=943115

Using ConstantScoreRangeQuery in place of RangeQuery...
http://issues.apache.org/bugzilla/show_bug.cgi?id=34673

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Vedr. Re: Design question [too many fields?]

Posted by markharw00d <ma...@yahoo.co.uk>.
>>about 4900 room units which I think is OK as far as
>>Still we have optimization work to do.

Assuming your availability is a year in advance and yours is a reputable chain of hotels that books rooms by the day, (not the hour!) You only need:
4900 * 365 bits of true/false info to cache all the availability data you need.
This is a Bitset occupying less than a megabyte of RAM.
You could index into this sort of structure very quickly (for the appropriate date/doc positions) and  get a big performance boost.
Perhaps more complex to implement but certainly a very fast solution.



	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Vedr. Re: Design question [too many fields?]

Posted by Naimdjon Takhirov <tn...@yahoo.com>.
Guys, thanks for your inputs.
I think the solution Mark has suggested does solves
the problem in an acceptable way. Its actually gonna
be a little better than the solution the customer is
has right now.
Apart from the availability we have to also check if
there is any price for room units saved in database
since the customer has a flexible pricing system
(discounts in weekends, season prices and so on) so
its very important that the availability/price
information is quite up-to-date.
I implemented it with the filter(Mark: there were
about 4900 room units which I think is OK as far as
memory concerned) and instead of 10 secs searching for
a period of two weeks it takes about 3 seconds now.
Still we have optimization work to do.

Naimdjon

--- Chris Lu <ch...@gmail.com> skrev:

> Erik,  Mark and Naimdjon, Sorry I totally
> misunderstood the question,
> of multiple dates for a Document. I came to agree
> with Erik and Mark
> on this problem.
> 
> My head was thinking to find a generic solution to
> Lucene's
> limitation: The TooManyClauses problem when using
> RangeQuery and there
> are more than 1024 values. It should be another
> thread.
> 
> -- 
> Chris Lu
> ---------------------
> Full-Text Search on Any Database
> http://www.dbsight.net
> 
> 
> On 7/1/05, Erik Hatcher <er...@ehatchersolutions.com>
> wrote:
> > 
> > On Jun 30, 2005, at 11:27 PM, Chris Lu wrote:
> > > Mark, your suggestion will incur another trip to
> the database. And
> > > if the search results is large, filtering in DB
> by pk is not really
> > > good.
> > 
> > Chris - I disagree with that last comment.  It can
> be a great
> > solution when the filter is cached.  Certainly
> building a filter for
> > every search would be inefficient, but filters are
> really best when
> > cached.
> > 
> > > Erik, your original "date" field is good when
> there is not many
> > > dates(<1024) in the database. Otherwise, Range
> Query can not handle
> > > it.
> > 
> > Not quite correct... it would not matter how many
> dates were in the
> > index/database, only how many were within the
> range used by
> > RangeQuery.  The original requirement was a years
> worth of days,
> > which at most would be 366 days and I suspect
> someone looking for
> > hotel room availability would be narrowing things
> down to a week or
> > month.
> > 
> > > My suggestion is, use "year" + "month" + "day"
> three fields to
> > > store date. And when searching, for example, any
> date that's
> > > greater than 2005-06-30, you can use this query
> to search: ( year >
> > > 2005 ) or  ( year=2005 and month>=6) or (
> year=2005 and month=6 and
> > > day > 30 ).
> > > It's a combination of BooleanQuery, TermQuery,
> and RangeQuery.
> > 
> > How would you represent multiple dates for a
> Document using that
> > scheme?  Wasn't that one of the original
> requirements?
> > 
> > > This may seem cumbersome, but it can save one
> trip to database, and
> > > circumvent Lucene's limitation.
> > 
> > One trip to the DB *once* with the results cached
> is mighty
> > inexpensive in the grand scheme of things.  Mark's
> point is something
> > I agree with (and wrote about in the custom filter
> example in Lucene
> > in Action) - some information makes good sense to
> stay in a
> > relational database when its too volatile to put
> in a Lucene index.
> > Building a filter to access a DB, with the results
> cached is a good
> > solution to the specified problem, I think. 
> Certainly there are many
> > ways to solve it though.
> > 
> >      Erik
> > 
> > 
> > >
> > > Chris Lu
> > > http://www.dbsight.net
> > >
> > > Erik Hatcher wrote:
> > >
> > >
> > >> I second Mark's suggestion over the alternative
> I posted.  My
> > >> alternative was merely to invert the field
> structure originally
> > >> described, but using a Filter for the volatile
> information is wiser.
> > >>
> > >>     Erik
> > >>
> > >> On Jun 29, 2005, at 9:58 AM, mark harwood
> wrote:
> > >>
> > >>
> > >>> Presumably there is also a free-text element
> to the
> > >>> search or you wouldn't be using Lucene.
> > >>>
> > >>> Multiple fields is not the way to go.
> > >>> A single Lucene field could contain multiple
> terms (
> > >>> the available dates) but I still don't think
> that's
> > >>> the best solution.
> > >>> The availability info is likely to be pretty
> volatile
> > >>> and you always want up-to-date info so I would
> prefer
> > >>> to hit a database for this. If you keep a DB
> primary
> > >>> key to Lucene doc id look-up cached in memory
> you can
> > >>> quickly construct a Lucene filter from the
> database
> > >>> results and therefore only show Lucene results
> for
> > >>> available rooms.
> > >>>
> > >>> Cheers
> > >>> Mark
> > >>>
> > >>>
> > >>>
> > >>>
>
___________________________________________________________
> > >>> How much free photo storage do you get? Store
> your holiday
> > >>> snaps for FREE with Yahoo! Photos
> http://uk.photos.yahoo.com
> > >>>
> > >>>
>
--------------------------------------------------------------------
> > >>> -
> > >>> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > >>> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > >>>
> > >>>
> > >>
> > >>
> > >>
>
---------------------------------------------------------------------
> > >> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > >>
> > >>
> > >>
> > >
> > >
> > >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > >
> > 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > 
> >
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Design question [too many fields?]

Posted by Chris Lu <ch...@gmail.com>.
Erik,  Mark and Naimdjon, Sorry I totally misunderstood the question,
of multiple dates for a Document. I came to agree with Erik and Mark
on this problem.

My head was thinking to find a generic solution to Lucene's
limitation: The TooManyClauses problem when using RangeQuery and there
are more than 1024 values. It should be another thread.

-- 
Chris Lu
---------------------
Full-Text Search on Any Database
http://www.dbsight.net


On 7/1/05, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> 
> On Jun 30, 2005, at 11:27 PM, Chris Lu wrote:
> > Mark, your suggestion will incur another trip to the database. And
> > if the search results is large, filtering in DB by pk is not really
> > good.
> 
> Chris - I disagree with that last comment.  It can be a great
> solution when the filter is cached.  Certainly building a filter for
> every search would be inefficient, but filters are really best when
> cached.
> 
> > Erik, your original "date" field is good when there is not many
> > dates(<1024) in the database. Otherwise, Range Query can not handle
> > it.
> 
> Not quite correct... it would not matter how many dates were in the
> index/database, only how many were within the range used by
> RangeQuery.  The original requirement was a years worth of days,
> which at most would be 366 days and I suspect someone looking for
> hotel room availability would be narrowing things down to a week or
> month.
> 
> > My suggestion is, use "year" + "month" + "day" three fields to
> > store date. And when searching, for example, any date that's
> > greater than 2005-06-30, you can use this query to search: ( year >
> > 2005 ) or  ( year=2005 and month>=6) or ( year=2005 and month=6 and
> > day > 30 ).
> > It's a combination of BooleanQuery, TermQuery, and RangeQuery.
> 
> How would you represent multiple dates for a Document using that
> scheme?  Wasn't that one of the original requirements?
> 
> > This may seem cumbersome, but it can save one trip to database, and
> > circumvent Lucene's limitation.
> 
> One trip to the DB *once* with the results cached is mighty
> inexpensive in the grand scheme of things.  Mark's point is something
> I agree with (and wrote about in the custom filter example in Lucene
> in Action) - some information makes good sense to stay in a
> relational database when its too volatile to put in a Lucene index.
> Building a filter to access a DB, with the results cached is a good
> solution to the specified problem, I think.  Certainly there are many
> ways to solve it though.
> 
>      Erik
> 
> 
> >
> > Chris Lu
> > http://www.dbsight.net
> >
> > Erik Hatcher wrote:
> >
> >
> >> I second Mark's suggestion over the alternative I posted.  My
> >> alternative was merely to invert the field structure originally
> >> described, but using a Filter for the volatile information is wiser.
> >>
> >>     Erik
> >>
> >> On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
> >>
> >>
> >>> Presumably there is also a free-text element to the
> >>> search or you wouldn't be using Lucene.
> >>>
> >>> Multiple fields is not the way to go.
> >>> A single Lucene field could contain multiple terms (
> >>> the available dates) but I still don't think that's
> >>> the best solution.
> >>> The availability info is likely to be pretty volatile
> >>> and you always want up-to-date info so I would prefer
> >>> to hit a database for this. If you keep a DB primary
> >>> key to Lucene doc id look-up cached in memory you can
> >>> quickly construct a Lucene filter from the database
> >>> results and therefore only show Lucene results for
> >>> available rooms.
> >>>
> >>> Cheers
> >>> Mark
> >>>
> >>>
> >>>
> >>> ___________________________________________________________
> >>> How much free photo storage do you get? Store your holiday
> >>> snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Design question [too many fields?]

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 30, 2005, at 11:27 PM, Chris Lu wrote:
> Mark, your suggestion will incur another trip to the database. And  
> if the search results is large, filtering in DB by pk is not really  
> good.

Chris - I disagree with that last comment.  It can be a great  
solution when the filter is cached.  Certainly building a filter for  
every search would be inefficient, but filters are really best when  
cached.

> Erik, your original "date" field is good when there is not many  
> dates(<1024) in the database. Otherwise, Range Query can not handle  
> it.

Not quite correct... it would not matter how many dates were in the  
index/database, only how many were within the range used by  
RangeQuery.  The original requirement was a years worth of days,  
which at most would be 366 days and I suspect someone looking for  
hotel room availability would be narrowing things down to a week or  
month.

> My suggestion is, use "year" + "month" + "day" three fields to  
> store date. And when searching, for example, any date that's  
> greater than 2005-06-30, you can use this query to search: ( year >  
> 2005 ) or  ( year=2005 and month>=6) or ( year=2005 and month=6 and  
> day > 30 ).
> It's a combination of BooleanQuery, TermQuery, and RangeQuery.

How would you represent multiple dates for a Document using that  
scheme?  Wasn't that one of the original requirements?

> This may seem cumbersome, but it can save one trip to database, and  
> circumvent Lucene's limitation.

One trip to the DB *once* with the results cached is mighty  
inexpensive in the grand scheme of things.  Mark's point is something  
I agree with (and wrote about in the custom filter example in Lucene  
in Action) - some information makes good sense to stay in a  
relational database when its too volatile to put in a Lucene index.  
Building a filter to access a DB, with the results cached is a good  
solution to the specified problem, I think.  Certainly there are many  
ways to solve it though.

     Erik


>
> Chris Lu
> http://www.dbsight.net
>
> Erik Hatcher wrote:
>
>
>> I second Mark's suggestion over the alternative I posted.  My   
>> alternative was merely to invert the field structure originally   
>> described, but using a Filter for the volatile information is wiser.
>>
>>     Erik
>>
>> On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
>>
>>
>>> Presumably there is also a free-text element to the
>>> search or you wouldn't be using Lucene.
>>>
>>> Multiple fields is not the way to go.
>>> A single Lucene field could contain multiple terms (
>>> the available dates) but I still don't think that's
>>> the best solution.
>>> The availability info is likely to be pretty volatile
>>> and you always want up-to-date info so I would prefer
>>> to hit a database for this. If you keep a DB primary
>>> key to Lucene doc id look-up cached in memory you can
>>> quickly construct a Lucene filter from the database
>>> results and therefore only show Lucene results for
>>> available rooms.
>>>
>>> Cheers
>>> Mark
>>>
>>>
>>>
>>> ___________________________________________________________
>>> How much free photo storage do you get? Store your holiday
>>> snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Vedr. Re: Design question [too many fields?]

Posted by Naimdjon Takhirov <tn...@yahoo.com>.
Hi Chris,

It is anyway going to be too many fields then? Days of
year for the whole year ahead? Since the fromDate and
toDate can be across two months and the customer wants
the data be available for one year.

Naimdjon

--- Chris Lu <ch...@gmail.com> skrev:

> Mark, your suggestion will incur another trip to the
> database. And if 
> the search results is large, filtering in DB by pk
> is not really good.
> 
> Erik, your original "date" field is good when there
> is not many 
> dates(<1024) in the database. Otherwise, Range Query
> can not handle it.
> 
> My suggestion is, use "year" + "month" + "day" three
> fields to store 
> date. And when searching, for example, any date
> that's greater than 
> 2005-06-30, you can use this query to search: ( year
> > 2005 ) or  ( 
> year=2005 and month>=6) or ( year=2005 and month=6
> and day > 30 ).
> It's a combination of BooleanQuery, TermQuery, and
> RangeQuery.
> 
> This may seem cumbersome, but it can save one trip
> to database, and 
> circumvent Lucene's limitation.
> 
> Chris Lu
> http://www.dbsight.net
> 
> Erik Hatcher wrote:
> 
> > I second Mark's suggestion over the alternative I
> posted.  My  
> > alternative was merely to invert the field
> structure originally  
> > described, but using a Filter for the volatile
> information is wiser.
> >
> >     Erik
> >
> > On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
> >
> >> Presumably there is also a free-text element to
> the
> >> search or you wouldn't be using Lucene.
> >>
> >> Multiple fields is not the way to go.
> >> A single Lucene field could contain multiple
> terms (
> >> the available dates) but I still don't think
> that's
> >> the best solution.
> >> The availability info is likely to be pretty
> volatile
> >> and you always want up-to-date info so I would
> prefer
> >> to hit a database for this. If you keep a DB
> primary
> >> key to Lucene doc id look-up cached in memory you
> can
> >> quickly construct a Lucene filter from the
> database
> >> results and therefore only show Lucene results
> for
> >> available rooms.
> >>
> >> Cheers
> >> Mark
> >>
> >>
> >>
> >>
>
___________________________________________________________
> >> How much free photo storage do you get? Store
> your holiday
> >> snaps for FREE with Yahoo! Photos
> http://uk.photos.yahoo.com
> >>
> >>
>
---------------------------------------------------------------------
> >> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >>
> >
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >
> >
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org