You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Chris Lu <ch...@gmail.com> on 2005/07/01 05:27:49 UTC

Re: Design question [too many fields?]

Mark, your suggestion will incur another trip to the database. And if 
the search results is large, filtering in DB by pk is not really good.

Erik, your original "date" field is good when there is not many 
dates(<1024) in the database. Otherwise, Range Query can not handle it.

My suggestion is, use "year" + "month" + "day" three fields to store 
date. And when searching, for example, any date that's greater than 
2005-06-30, you can use this query to search: ( year > 2005 ) or  ( 
year=2005 and month>=6) or ( year=2005 and month=6 and day > 30 ).
It's a combination of BooleanQuery, TermQuery, and RangeQuery.

This may seem cumbersome, but it can save one trip to database, and 
circumvent Lucene's limitation.

Chris Lu
http://www.dbsight.net

Erik Hatcher wrote:

> I second Mark's suggestion over the alternative I posted.  My  
> alternative was merely to invert the field structure originally  
> described, but using a Filter for the volatile information is wiser.
>
>     Erik
>
> On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
>
>> Presumably there is also a free-text element to the
>> search or you wouldn't be using Lucene.
>>
>> Multiple fields is not the way to go.
>> A single Lucene field could contain multiple terms (
>> the available dates) but I still don't think that's
>> the best solution.
>> The availability info is likely to be pretty volatile
>> and you always want up-to-date info so I would prefer
>> to hit a database for this. If you keep a DB primary
>> key to Lucene doc id look-up cached in memory you can
>> quickly construct a Lucene filter from the database
>> results and therefore only show Lucene results for
>> available rooms.
>>
>> Cheers
>> Mark
>>
>>
>>
>> ___________________________________________________________
>> How much free photo storage do you get? Store your holiday
>> snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Vedr. Re: Design question [too many fields?]

Posted by Chris Lu <ch...@gmail.com>.

> It is anyway going to be too many fields then? Days of
> year for the whole year ahead? Since the fromDate and
> toDate can be across two months and the customer wants
> the data be available for one year.

It won't have too many fields.
> > My suggestion is, use "year" + "month" + "day" three
> > fields to store
"day" field means days for one month. So "month" and "day" two fields
will have 12 and 31 values respectively.
And "year" field depends on what data you got. I guess your data won't
span accross 1024 years.

-- 
Chris Lu
---------------------
Full-Text Search on Any Database
http://www.dbsight.net

On 6/30/05, Naimdjon Takhirov <tn...@yahoo.com> wrote:
> Hi Chris,
> 
> It is anyway going to be too many fields then? Days of
> year for the whole year ahead? Since the fromDate and
> toDate can be across two months and the customer wants
> the data be available for one year.
> 
> Naimdjon
> 
> --- Chris Lu <ch...@gmail.com> skrev:
> 
> > Mark, your suggestion will incur another trip to the
> > database. And if
> > the search results is large, filtering in DB by pk
> > is not really good.
> >
> > Erik, your original "date" field is good when there
> > is not many
> > dates(<1024) in the database. Otherwise, Range Query
> > can not handle it.
> >
> > My suggestion is, use "year" + "month" + "day" three
> > fields to store
> > date. And when searching, for example, any date
> > that's greater than
> > 2005-06-30, you can use this query to search: ( year
> > > 2005 ) or  (
> > year=2005 and month>=6) or ( year=2005 and month=6
> > and day > 30 ).
> > It's a combination of BooleanQuery, TermQuery, and
> > RangeQuery.
> >
> > This may seem cumbersome, but it can save one trip
> > to database, and
> > circumvent Lucene's limitation.
> >
> > Chris Lu
> > http://www.dbsight.net
> >
> > Erik Hatcher wrote:
> >
> > > I second Mark's suggestion over the alternative I
> > posted.  My
> > > alternative was merely to invert the field
> > structure originally
> > > described, but using a Filter for the volatile
> > information is wiser.
> > >
> > >     Erik
> > >
> > > On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
> > >
> > >> Presumably there is also a free-text element to
> > the
> > >> search or you wouldn't be using Lucene.
> > >>
> > >> Multiple fields is not the way to go.
> > >> A single Lucene field could contain multiple
> > terms (
> > >> the available dates) but I still don't think
> > that's
> > >> the best solution.
> > >> The availability info is likely to be pretty
> > volatile
> > >> and you always want up-to-date info so I would
> > prefer
> > >> to hit a database for this. If you keep a DB
> > primary
> > >> key to Lucene doc id look-up cached in memory you
> > can
> > >> quickly construct a Lucene filter from the
> > database
> > >> results and therefore only show Lucene results
> > for
> > >> available rooms.
> > >>
> > >> Cheers
> > >> Mark
> > >>
> > >>
> > >>
> > >>
> >
> ___________________________________________________________
> > >> How much free photo storage do you get? Store
> > your holiday
> > >> snaps for FREE with Yahoo! Photos
> > http://uk.photos.yahoo.com
> > >>
> > >>
> >
> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >>
> > >
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.

Quoting Paul Elschot <pa...@xs4all.nl>:

> Dave,
>
> On Tuesday 05 July 2005 20:54, Paul Elschot wrote:
> > On Tuesday 05 July 2005 14:35, Dave Kor wrote:
> ...
> > >
> > > Hopefully, this explains what I am trying to achieve with Lucene and why
> I need
> > > to match repeated sub-queries. I would really appreciate it if anyone has
> a
> > > solution, a quickfix or can guide me in hacking up something workable.
> >
> > So, in an ordered SpanNearQuery, you want repeated subqueries not to match
> the
> > same text/tokens, which boils down to non overlapping matches.
> >
> > I had another look at NearSpans.java, and I'm afraid there is no quick fix
> for this
> > (but I'd like to be be proven wrong).
> > Spans can match ordered/unordered and overlapping/nonoverlapping.
> > Currently for the overlap there is no parameter, and I don't know how
> > SpanNearQuery behaves wrt. to overlapping matches.
> > There is no special case for equal subqueries, which is probably ok, but
> > when overlaps are allowed care should be taken not to use equal subqueries.
> >
> > On hacking up something workable: it would be good to get this
> > bug out of NearSpans.
>
> This might be a fix, it reduces the number of cases that are considered
> ordered
> matches. It also passes all unit tests here:
>
>   private boolean matchIsOrdered() {
>     SpansCell spansCell = (SpansCell) ordered.get(0);
>     int lastStart = spansCell.start(); // no need to compare doc nrs here.
>     int lastEnd = spansCell.end();
>     for (int i = 1; i < ordered.size(); i++) {
>       spansCell = (SpansCell) ordered.get(i);
>       int start = spansCell.start();
>       int end = spansCell.end();
>       if ((start < lastStart) || ((start == lastStart) && (end <= lastEnd)))
> {
>         return false; // also equal begin and end is not ordered.
>       }
>       lastStart = start;
>       lastEnd = end;
>     }
>     return true;
>   }
>
> Could you replace the matchIsOrdered() method with the above one
> and see whether you can still reproduce the "Unexptected: ordered"
> exception?
>
> There is some interplay between the matchIsOrdered() method and
> the lessThan() method in CellQueue that also uses  the SpansCell index,
> and I hope this gets it right.

Yup, your code has eliminated all the exceptions. But so far I have not had time
to look in detail to see if it works correctly, (my deadline is this wednesday)
so I am just assuming it works. I'll get back to you next week if everything
checks out correctly.


Regards,
Dave.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.

Dave, 

On Tuesday 05 July 2005 20:54, Paul Elschot wrote:
> On Tuesday 05 July 2005 14:35, Dave Kor wrote:
...
> > 
> > Hopefully, this explains what I am trying to achieve with Lucene and why I need
> > to match repeated sub-queries. I would really appreciate it if anyone has a
> > solution, a quickfix or can guide me in hacking up something workable.
> 
> So, in an ordered SpanNearQuery, you want repeated subqueries not to match the
> same text/tokens, which boils down to non overlapping matches.
> 
> I had another look at NearSpans.java, and I'm afraid there is no quick fix for this
> (but I'd like to be be proven wrong).
> Spans can match ordered/unordered and overlapping/nonoverlapping.
> Currently for the overlap there is no parameter, and I don't know how
> SpanNearQuery behaves wrt. to overlapping matches.
> There is no special case for equal subqueries, which is probably ok, but
> when overlaps are allowed care should be taken not to use equal subqueries.
> 
> On hacking up something workable: it would be good to get this
> bug out of NearSpans.

This might be a fix, it reduces the number of cases that are considered ordered
matches. It also passes all unit tests here:

  private boolean matchIsOrdered() {
    SpansCell spansCell = (SpansCell) ordered.get(0); 
    int lastStart = spansCell.start(); // no need to compare doc nrs here.
    int lastEnd = spansCell.end();
    for (int i = 1; i < ordered.size(); i++) {
      spansCell = (SpansCell) ordered.get(i);
      int start = spansCell.start();
      int end = spansCell.end();
      if ((start < lastStart) || ((start == lastStart) && (end <= lastEnd))) {
        return false; // also equal begin and end is not ordered.
      }
      lastStart = start;
      lastEnd = end;
    }
    return true;
  }

Could you replace the matchIsOrdered() method with the above one
and see whether you can still reproduce the "Unexptected: ordered"
exception?

There is some interplay between the matchIsOrdered() method and
the lessThan() method in CellQueue that also uses  the SpansCell index,
and I hope this gets it right.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.

On Tuesday 05 July 2005 14:35, Dave Kor wrote:
> Quoting Paul Elschot <pa...@xs4all.nl>:
> 
> > On Monday 04 July 2005 22:51, Dave Kor wrote:
> > > > I had another look at the code, and my guess now is that this is
> > > > related to the spanNear with the single argument.
> > > >
> > > > It rings some bells. One of them is that I would have preferred
> > > > to split the SpanNear class into ordered/unordered after the fix,
> > > > but that I gave up because it would take too much time.
> > > > The current SpanNear class is too complex for easy maintenance.
> > > >
> > > > Perhaps the quick fix is to verify in the constructor of SpanNearQuery
> > > > that the number of clauses is at least 2, and to throw an illegal arg
> > > > exception otherwise.
> > >
> > > Alright, I'll add code to ensure that I do not generate SpanNearQueries
> > that
> > > contain only a single sub-query and see what happens, I hope this solves my
> > > problem!
> > >
> > > Earlier, I went back to have a more in-depth look at the queries that were
> > > throwing these exceptions. My system, an experimental query expansion
> > module,
> > > had generated over 900+ queries and out of those, 50-60 queries cause the
> > RTE.
> > >
> > > From these queries, I can find many repeated multi-term SpanNearQueries
> > that
> > > also throws the same RTE. Here are some examples where the bracket shows
> > how
> > > the terms are grouped in a SpanNearQuery:
> > >
> > > ((the (regent hotel)) (the (regent hotel) to))
> > > (((elton john)) ((elton john) and))
> > > (((the who) is) ((the who) of))
> > > ((is) (the (the band nirvana) band))
> > > (((united states)) (united states president is the))
> > > (((academy awards) of) ((academy awards) is))
> >
> > In all these cases overlap between two matches can occur because they have
> > an equal subquery. The conclusion is that the current span code is not
> > capable
> > of handling such cases. It probably chokes at the moment the matches for
> > such subqueries concur.
> 
> I'm not quite sure what you mean here by "an equal subquery". I am not trying to
> get two subqueries to match the same portion of a document. Instead, I am
> looking for a repeat of the same search term(s) somewhere farther in the
> document.

I meant for example
(elton john)
occurring twice above.
 
> > The question is whether you would consider such a concurrence to be a match
> > for the query.
> > If so, the fix might be to return true instead of throwing the exception.
> 
> I have simplified the above examples by substituting the original search terms
> with more intelligible terms, which unfortunately made the above queries seem
> pointless. In reality, my system is trying to search for sentences that conform
> to certain linguistic structures.
> 
> An example of a useful search is a comma followed by another comma several words
> later, followed by the phrase "academy award winner". In other words
> 
> (, (, (academy award winner)~2)~3)~8
> 
> This search would pick up only sentences like "Dafoe , who played the role of
> Jesus in The Last Temptation of Christ , is also an Academy Award winner for
> his ... "
> 
> Hopefully, this explains what I am trying to achieve with Lucene and why I need
> to match repeated sub-queries. I would really appreciate it if anyone has a
> solution, a quickfix or can guide me in hacking up something workable.

So, in an ordered SpanNearQuery, you want repeated subqueries not to match the
same text/tokens, which boils down to non overlapping matches.

I had another look at NearSpans.java, and I'm afraid there is no quick fix for this
(but I'd like to be be proven wrong).
Spans can match ordered/unordered and overlapping/nonoverlapping.
Currently for the overlap there is no parameter, and I don't know how
SpanNearQuery behaves wrt. to overlapping matches.
There is no special case for equal subqueries, which is probably ok, but
when overlaps are allowed care should be taken not to use equal subqueries.

On hacking up something workable: it would be good to get this
bug out of NearSpans.

Anyway, to test this, eg. using the examples you gave above,
TestSpans.java here has some small code examples to start from:
http://svn.apache.org/viewcvs.cgi/lucene/java/tags/lucene_1_4_3/src/test/org/apache/lucene/search/spans/

TestBasics.java there has some larger examples.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.

Quoting Paul Elschot <pa...@xs4all.nl>:

> On Monday 04 July 2005 22:51, Dave Kor wrote:
> > > I had another look at the code, and my guess now is that this is
> > > related to the spanNear with the single argument.
> > >
> > > It rings some bells. One of them is that I would have preferred
> > > to split the SpanNear class into ordered/unordered after the fix,
> > > but that I gave up because it would take too much time.
> > > The current SpanNear class is too complex for easy maintenance.
> > >
> > > Perhaps the quick fix is to verify in the constructor of SpanNearQuery
> > > that the number of clauses is at least 2, and to throw an illegal arg
> > > exception otherwise.
> >
> > Alright, I'll add code to ensure that I do not generate SpanNearQueries
> that
> > contain only a single sub-query and see what happens, I hope this solves my
> > problem!
> >
> > Earlier, I went back to have a more in-depth look at the queries that were
> > throwing these exceptions. My system, an experimental query expansion
> module,
> > had generated over 900+ queries and out of those, 50-60 queries cause the
> RTE.
> >
> > From these queries, I can find many repeated multi-term SpanNearQueries
> that
> > also throws the same RTE. Here are some examples where the bracket shows
> how
> > the terms are grouped in a SpanNearQuery:
> >
> > ((the (regent hotel)) (the (regent hotel) to))
> > (((elton john)) ((elton john) and))
> > (((the who) is) ((the who) of))
> > ((is) (the (the band nirvana) band))
> > (((united states)) (united states president is the))
> > (((academy awards) of) ((academy awards) is))
>
> In all these cases overlap between two matches can occur because they have
> an equal subquery. The conclusion is that the current span code is not
> capable
> of handling such cases. It probably chokes at the moment the matches for
> such subqueries concur.

I'm not quite sure what you mean here by "an equal subquery". I am not trying to
get two subqueries to match the same portion of a document. Instead, I am
looking for a repeat of the same search term(s) somewhere farther in the
document.

> The question is whether you would consider such a concurrence to be a match
> for the query.
> If so, the fix might be to return true instead of throwing the exception.

I have simplified the above examples by substituting the original search terms
with more intelligible terms, which unfortunately made the above queries seem
pointless. In reality, my system is trying to search for sentences that conform
to certain linguistic structures.

An example of a useful search is a comma followed by another comma several words
later, followed by the phrase "academy award winner". In other words

(, (, (academy award winner)~2)~3)~8

This search would pick up only sentences like "Dafoe , who played the role of
Jesus in The Last Temptation of Christ , is also an Academy Award winner for
his ... "

Hopefully, this explains what I am trying to achieve with Lucene and why I need
to match repeated sub-queries. I would really appreciate it if anyone has a
solution, a quickfix or can guide me in hacking up something workable.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.

On Monday 04 July 2005 22:51, Dave Kor wrote:
> *chuckles* It seems I can post to this list without subscribing to it. :)
> 
> > I had another look at the code, and my guess now is that this is
> > related to the spanNear with the single argument.
> >
> > It rings some bells. One of them is that I would have preferred
> > to split the SpanNear class into ordered/unordered after the fix,
> > but that I gave up because it would take too much time.
> > The current SpanNear class is too complex for easy maintenance.
> >
> > Perhaps the quick fix is to verify in the constructor of SpanNearQuery
> > that the number of clauses is at least 2, and to throw an illegal arg
> > exception otherwise.
> 
> Alright, I'll add code to ensure that I do not generate SpanNearQueries that
> contain only a single sub-query and see what happens, I hope this solves my
> problem!
> 
> Earlier, I went back to have a more in-depth look at the queries that were
> throwing these exceptions. My system, an experimental query expansion 
module,
> had generated over 900+ queries and out of those, 50-60 queries cause the 
RTE.
> 
> From these queries, I can find many repeated multi-term SpanNearQueries that
> also throws the same RTE. Here are some examples where the bracket shows how
> the terms are grouped in a SpanNearQuery:
> 
> ((the (regent hotel)) (the (regent hotel) to))
> (((elton john)) ((elton john) and))
> (((the who) is) ((the who) of))
> ((is) (the (the band nirvana) band))
> (((united states)) (united states president is the))
> (((academy awards) of) ((academy awards) is))

In all these cases overlap between two matches can occur because they have
an equal subquery. The conclusion is that the current span code is not capable
of handling such cases. It probably chokes at the moment the matches for
such subqueries concur.

The question is whether you would consider such a concurrence to be a match
for the query.
If so, the fix might be to return true instead of throwing the exception.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.

On Sunday 03 July 2005 17:42, Dave Kor wrote:
> Quoting Paul Elschot <pa...@xs4all.nl>:
> 
> > On Sunday 03 July 2005 15:27, Dave Kor wrote:
> > > I have a system that automatically generate span queries to Lucene.
> > Sometimes,
> > > the system generates a query like this one which always throws a
> > > RuntimeException:
> > >
> > > spanNear([spanNear([text:interesting], 3, true),
> > spanNear([text:interesting,
> > > text:john, text:said], 8, true)], 2, true)
> > >
> > > Basically, the system is looking for a document that contains a string
> > sequence
> > > "interesting .... interesting john said". The thrown exception is as
> > follows:
> > >
> > > java.lang.RuntimeException: Unexpected: ordered
> > >         at
> > >
> >
> 
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
> > > Source)
> > >         at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
> > >         at org.apache.lucene.search.spans.SpanScorer.next(Unknown 
Source)
> > >         at org.apache.lucene.search.Scorer.score(Unknown Source)
> > >         at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> > >
> > > My question is, what does is this "Unexpected: ordered" mean? and is 
there
> > > anyway I can avoid these exceptions?
> >
> > It's an internal error that is not supposed to occur.
> > Could you continue on the java-dev list?
> >
> > SpanNearQuery is not supposed to operate on a single argument, at least
> > that's what I thought when I wrote the bug fix code that throws this
> > exception. Does the exception go away when you replace the first spanNear
> > (the one with the single [text:interesting]  with a SpanTermQuery ?

See below.

> >
> > It's also possible that the code cannot handle the two identical
> > text:interesting arguments.
> >
> > It's probably good to have a test case for this. Could you extend the
> > exception with the document number and maybe a position within the
> > document to try and get to the original text that causes this exception,
> > and use that to file a bug report?
> 
> I'll see what I can do about the test case. From what I can tell thus far, 
this
> exception is thrown when CellQueue is empty in the function
> NearSpan.firstNonOrderedNextToPartialList(). I hope it rings a bell 
somewhere.

I had another look at the code, and my guess now is that this is related to
the spanNear with the single argument. So I'd like to know what
happens when this is replaced by a SpanTermQuery.

It does ring some bells. One of them is that I would have prefered to split
the NearSpans class into two after the bug fix, one for the ordered case,
and one for the non ordered case. I did not make that split then because it
worked for the test cases, and I did not want to spend more time on it.
Anyway, the current NearSpans code is too complex for easy maintenance.

Perhaps the quick fix is to make sure that the SpanNearQuery passed to
the NearSpans has at least two clauses, and that the SpanNearQuery constructor
throws an IllegalArgumentException otherwise.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.

On Sunday 03 July 2005 23:23, Paul Elschot wrote:

Please forget about the last message, I thought I had lost the earlier one.

Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.

On Sunday 03 July 2005 17:42, Dave Kor wrote:
> Quoting Paul Elschot <pa...@xs4all.nl>:
> 
> > On Sunday 03 July 2005 15:27, Dave Kor wrote:
> > > I have a system that automatically generate span queries to Lucene.
> > Sometimes,
> > > the system generates a query like this one which always throws a
> > > RuntimeException:
> > >
> > > spanNear([spanNear([text:interesting], 3, true),
> > spanNear([text:interesting,
> > > text:john, text:said], 8, true)], 2, true)
> > >
> > > Basically, the system is looking for a document that contains a string
> > sequence
> > > "interesting .... interesting john said". The thrown exception is as
> > follows:
> > >
> > > java.lang.RuntimeException: Unexpected: ordered
> > >         at
> > >
> >
> 
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
> > > Source)
> > >         at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
> > >         at org.apache.lucene.search.spans.SpanScorer.next(Unknown 
Source)
> > >         at org.apache.lucene.search.Scorer.score(Unknown Source)
> > >         at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> > >
> > > My question is, what does is this "Unexpected: ordered" mean? and is 
there
> > > anyway I can avoid these exceptions?
> >
> > It's an internal error that is not supposed to occur.
> > Could you continue on the java-dev list?
> >
> > SpanNearQuery is not supposed to operate on a single argument, at least
> > that's what I thought when I wrote the bug fix code that throws this
> > exception. Does the exception go away when you replace the first spanNear
> > (the one with the single [text:interesting]  with a SpanTermQuery ?
> >
> > It's also possible that the code cannot handle the two identical
> > text:interesting arguments.
> >
> > It's probably good to have a test case for this. Could you extend the
> > exception with the document number and maybe a position within the
> > document to try and get to the original text that causes this exception,
> > and use that to file a bug report?
> 
> I'll see what I can do about the test case. From what I can tell thus far, 
this
> exception is thrown when CellQueue is empty in the function
> NearSpan.firstNonOrderedNextToPartialList(). I hope it rings a bell 
somewhere.

I had another look at the code, and my guess now is that this is
related to the spanNear with the single argument.

It rings some bells. One of them is that I would have preferred
to split the SpanNear class into ordered/unordered after the fix,
but that I gave up because it would take too much time.
The current SpanNear class is too complex for easy maintenance.

Perhaps the quick fix is to verify in the constructor of SpanNearQuery
that the number of clauses is at least 2, and to throw an illegal arg
exception otherwise.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Jul 4, 2005, at 4:51 PM, Dave Kor wrote:
> *chuckles* It seems I can post to this list without subscribing to  
> it. :)

I moderate in messages that are on topic but from unsubscribed  
addresses quite often.  Perhaps this was the case?

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.

*chuckles* It seems I can post to this list without subscribing to it. :)

> I had another look at the code, and my guess now is that this is
> related to the spanNear with the single argument.
>
> It rings some bells. One of them is that I would have preferred
> to split the SpanNear class into ordered/unordered after the fix,
> but that I gave up because it would take too much time.
> The current SpanNear class is too complex for easy maintenance.
>
> Perhaps the quick fix is to verify in the constructor of SpanNearQuery
> that the number of clauses is at least 2, and to throw an illegal arg
> exception otherwise.

Alright, I'll add code to ensure that I do not generate SpanNearQueries that
contain only a single sub-query and see what happens, I hope this solves my
problem!

Earlier, I went back to have a more in-depth look at the queries that were
throwing these exceptions. My system, an experimental query expansion module,
had generated over 900+ queries and out of those, 50-60 queries cause the RTE.

>From these queries, I can find many repeated multi-term SpanNearQueries that
also throws the same RTE. Here are some examples where the bracket shows how
the terms are grouped in a SpanNearQuery:

((the (regent hotel)) (the (regent hotel) to))
(((elton john)) ((elton john) and))
(((the who) is) ((the who) of))
((is) (the (the band nirvana) band))
(((united states)) (united states president is the))
(((academy awards) of) ((academy awards) is))

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.

Quoting Paul Elschot <pa...@xs4all.nl>:

> On Sunday 03 July 2005 15:27, Dave Kor wrote:
> > I have a system that automatically generate span queries to Lucene.
> Sometimes,
> > the system generates a query like this one which always throws a
> > RuntimeException:
> >
> > spanNear([spanNear([text:interesting], 3, true),
> spanNear([text:interesting,
> > text:john, text:said], 8, true)], 2, true)
> >
> > Basically, the system is looking for a document that contains a string
> sequence
> > "interesting .... interesting john said". The thrown exception is as
> follows:
> >
> > java.lang.RuntimeException: Unexpected: ordered
> >         at
> >
>
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
> > Source)
> >         at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
> >         at org.apache.lucene.search.spans.SpanScorer.next(Unknown Source)
> >         at org.apache.lucene.search.Scorer.score(Unknown Source)
> >         at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> >
> > My question is, what does is this "Unexpected: ordered" mean? and is there
> > anyway I can avoid these exceptions?
>
> It's an internal error that is not supposed to occur.
> Could you continue on the java-dev list?
>
> SpanNearQuery is not supposed to operate on a single argument, at least
> that's what I thought when I wrote the bug fix code that throws this
> exception. Does the exception go away when you replace the first spanNear
> (the one with the single [text:interesting]  with a SpanTermQuery ?
>
> It's also possible that the code cannot handle the two identical
> text:interesting arguments.
>
> It's probably good to have a test case for this. Could you extend the
> exception with the document number and maybe a position within the
> document to try and get to the original text that causes this exception,
> and use that to file a bug report?

I'll see what I can do about the test case. From what I can tell thus far, this
exception is thrown when CellQueue is empty in the function
NearSpan.firstNonOrderedNextToPartialList(). I hope it rings a bell somewhere.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Unexpected: ordered

Posted by Paul Elschot <pa...@xs4all.nl>.

On Sunday 03 July 2005 15:27, Dave Kor wrote:
> I have a system that automatically generate span queries to Lucene. 
Sometimes,
> the system generates a query like this one which always throws a
> RuntimeException:
> 
> spanNear([spanNear([text:interesting], 3, true), spanNear([text:interesting,
> text:john, text:said], 8, true)], 2, true)
> 
> Basically, the system is looking for a document that contains a string 
sequence
> "interesting .... interesting john said". The thrown exception is as 
follows:
> 
> java.lang.RuntimeException: Unexpected: ordered
>         at
> 
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
> Source)
>         at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
>         at org.apache.lucene.search.spans.SpanScorer.next(Unknown Source)
>         at org.apache.lucene.search.Scorer.score(Unknown Source)
>         at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> 
> My question is, what does is this "Unexpected: ordered" mean? and is there
> anyway I can avoid these exceptions?

It's an internal error that is not supposed to occur.
Could you continue on the java-dev list?

SpanNearQuery is not supposed to operate on a single argument, at least
that's what I thought when I wrote the bug fix code that throws this
exception. Does the exception go away when you replace the first spanNear
(the one with the single [text:interesting]  with a SpanTermQuery ?

It's also possible that the code cannot handle the two identical
text:interesting arguments. 

It's probably good to have a test case for this. Could you extend the
exception with the document number and maybe a position within the
document to try and get to the original text that causes this exception,
and use that to file a bug report?

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Unexpected: ordered

Posted by Dave Kor <s0...@sms.ed.ac.uk>.

I have a system that automatically generate span queries to Lucene. Sometimes,
the system generates a query like this one which always throws a
RuntimeException:

spanNear([spanNear([text:interesting], 3, true), spanNear([text:interesting,
text:john, text:said], 8, true)], 2, true)

Basically, the system is looking for a document that contains a string sequence
"interesting .... interesting john said". The thrown exception is as follows:

java.lang.RuntimeException: Unexpected: ordered
        at
org.apache.lucene.search.spans.NearSpans.firstNonOrderedNextToPartialList(Unknown
Source)
        at org.apache.lucene.search.spans.NearSpans.next(Unknown Source)
        at org.apache.lucene.search.spans.SpanScorer.next(Unknown Source)
        at org.apache.lucene.search.Scorer.score(Unknown Source)
        at org.apache.lucene.search.IndexSearcher.search(Unknown Source)

My question is, what does is this "Unexpected: ordered" mean? and is there
anyway I can avoid these exceptions?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org