You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by melix <ce...@lingway.com> on 2007/09/11 16:17:10 UTC

Span queries and complex scoring

Hi,

I'm working on an application which requires a complex scoring (based on
semantics analysis). The scoring must be highly configurable, and I've found
ways to do that, but I'm facing a discrete but annoying problem. All my
queries are, basically, complex span queries. I mean for example a
SpanNearQuery which embeds a SpanOrQuery which itself may embed another
SpanNearQuery etc...

I've followed the instructions at
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#changingScoring
about changing scoring. The problem is that a document score is highly
dependent on *what* matched, and that the getSpans() method on spanqueries
does not provide that kind of information.

I created my own SpanQuery subclasses which override the createWeight method
so that the scorer used is my own too. It basically replaces the SpanScorer,
and should recurse the spans tree to compose a score based on the type of
subqueries (near, and, or, not) and what matched. The problem is that the
getspans() methods that exists in Lucene are either anonymous classes which
I cannot browse, or that I have not access to the required information.

Basically, in a SpanOrQuery, I am not able to find out what matched. Have
any of you faced that kind of problem, and found out an elegant way to do it
without having to completely rewrite each getSpans() method for all types of
queries (this is basically what was done in a previous version of the
application) ?

Thanks,

Cedric

-- 
View this message in context: http://www.nabble.com/Span-queries-and-complex-scoring-tf4422915.html#a12615745
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Span queries and complex scoring

Posted by Grant Ingersoll <gs...@apache.org>.
Cedric,

On Sep 17, 2007, at 11:54 AM, Erik Hatcher wrote:

>
> On Sep 17, 2007, at 6:51 AM, melix wrote:
>> I've faced the very same problem with NearSpansOrdered and so on.  
>> I think
>> unless there is a very good reason for it, classes should be made  
>> public,
>> this would at least make the "delegate" design pattern available.
>
> Lucene has historically taken the exact opposite approach... open  
> up the API as needed.  Unless there is very good reason for it,  
> classes and data should be kept private.
>
> You bring up some points that need to be made to justify opening up  
> specific classes.  Sounds like a good discussion for the java-dev  
> list on a case by case basis.

I am doing some work on Spans (albeit slowly), perhaps you can  
describe your needs on java-dev (citing this thread) and we can see  
what comes of it?

Cheers,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Span queries and complex scoring

Posted by Chris Hostetter <ho...@fucit.org>.
: Lucene has historically taken the exact opposite approach... open up the API
: as needed.  Unless there is very good reason for it, classes and data should
: be kept private.

Just to clarify the reasoning behind this: once something is made public, 
the API has to be supported in perpetuity -- even if no one outside of 
the core is using it, because we don't *know* if anyone is using it.  

when things are private or package protected, committers have a lot more 
leway to commit changes that alter the API if it means better performacne, 
or easier long term maintence.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Span queries and complex scoring

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 17, 2007, at 6:51 AM, melix wrote:
> I've faced the very same problem with NearSpansOrdered and so on. I  
> think
> unless there is a very good reason for it, classes should be made  
> public,
> this would at least make the "delegate" design pattern available.

Lucene has historically taken the exact opposite approach... open up  
the API as needed.  Unless there is very good reason for it, classes  
and data should be kept private.

You bring up some points that need to be made to justify opening up  
specific classes.  Sounds like a good discussion for the java-dev  
list on a case by case basis.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Span queries and complex scoring

Posted by melix <ce...@lingway.com>.
Thanks Paul. I'm doing something very similar, but I'd like to notice that it
is very hard to "extend" Lucene without breaking compatibility. I mean I
have written classes that expand SpanQueries, but, for example, and to my
mind not understandable, classes like "BooleanWeight" are package protected,
which prevents from instanciating them. This often leads to code copies
which is not recommanded for maintainability...

I've faced the very same problem with NearSpansOrdered and so on. I think
unless there is a very good reason for it, classes should be made public,
this would at least make the "delegate" design pattern available.

Cheers,

Cedric


Paul Elschot wrote:
> 
> Cedric,
> 
> In case your requirements allow this, try and use subclass of Spans
> that has a score() method that returns a value that is used together
> with the other span info to provide a score value to your own
> SpanScorer at the top level.
> This score value can summarize the influence of the individual
> span scores of the subqueries.
> For this you will need to change the whole span package, but
> it is somewhat simpler than using a complete Scorer for each
> SpanQuery in the query tree.
> 
> With a lot of nested SpanOrQueries, merging the Spans can become
> a performance bottleneck. The current situation can be improved
> by creating a specialized PriorityQueue for Spans, much like the
> ScorerDocQueue that is used by DisjunctionSumScorer.
> With this, it is possible to avoid SpanOrQuery by using term payloads
> to compute the score value for the Spans of a SpanTermQuery, 
> but iirc the payloads are not yet in the trunk.
> 
> Regards,
> Paul Elschot
> 
> 
> On Tuesday 11 September 2007 16:17, melix wrote:
>> 
>> Hi,
>> 
>> I'm working on an application which requires a complex scoring (based on
>> semantics analysis). The scoring must be highly configurable, and I've
>> found
>> ways to do that, but I'm facing a discrete but annoying problem. All my
>> queries are, basically, complex span queries. I mean for example a
>> SpanNearQuery which embeds a SpanOrQuery which itself may embed another
>> SpanNearQuery etc...
>> 
>> I've followed the instructions at
>> 
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#changingScoring
>> about changing scoring. The problem is that a document score is highly
>> dependent on *what* matched, and that the getSpans() method on
>> spanqueries
>> does not provide that kind of information.
>> 
>> I created my own SpanQuery subclasses which override the createWeight
>> method
>> so that the scorer used is my own too. It basically replaces the
>> SpanScorer,
>> and should recurse the spans tree to compose a score based on the type of
>> subqueries (near, and, or, not) and what matched. The problem is that the
>> getspans() methods that exists in Lucene are either anonymous classes
>> which
>> I cannot browse, or that I have not access to the required information.
>> 
>> Basically, in a SpanOrQuery, I am not able to find out what matched. Have
>> any of you faced that kind of problem, and found out an elegant way to do
>> it
>> without having to completely rewrite each getSpans() method for all types
>> of
>> queries (this is basically what was done in a previous version of the
>> application) ?
>> 
>> Thanks,
>> 
>> Cedric
>> 
>> -- 
>> View this message in context: 
> http://www.nabble.com/Span-queries-and-complex-scoring-tf4422915.html#a12615745
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Span-queries-and-complex-scoring-tf4422915.html#a12733482
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Span queries and complex scoring

Posted by Paul Elschot <pa...@xs4all.nl>.
Cedric,

In case your requirements allow this, try and use subclass of Spans
that has a score() method that returns a value that is used together
with the other span info to provide a score value to your own
SpanScorer at the top level.
This score value can summarize the influence of the individual
span scores of the subqueries.
For this you will need to change the whole span package, but
it is somewhat simpler than using a complete Scorer for each
SpanQuery in the query tree.

With a lot of nested SpanOrQueries, merging the Spans can become
a performance bottleneck. The current situation can be improved
by creating a specialized PriorityQueue for Spans, much like the
ScorerDocQueue that is used by DisjunctionSumScorer.
With this, it is possible to avoid SpanOrQuery by using term payloads
to compute the score value for the Spans of a SpanTermQuery, 
but iirc the payloads are not yet in the trunk.

Regards,
Paul Elschot


On Tuesday 11 September 2007 16:17, melix wrote:
> 
> Hi,
> 
> I'm working on an application which requires a complex scoring (based on
> semantics analysis). The scoring must be highly configurable, and I've found
> ways to do that, but I'm facing a discrete but annoying problem. All my
> queries are, basically, complex span queries. I mean for example a
> SpanNearQuery which embeds a SpanOrQuery which itself may embed another
> SpanNearQuery etc...
> 
> I've followed the instructions at
> 
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#changingScoring
> about changing scoring. The problem is that a document score is highly
> dependent on *what* matched, and that the getSpans() method on spanqueries
> does not provide that kind of information.
> 
> I created my own SpanQuery subclasses which override the createWeight method
> so that the scorer used is my own too. It basically replaces the SpanScorer,
> and should recurse the spans tree to compose a score based on the type of
> subqueries (near, and, or, not) and what matched. The problem is that the
> getspans() methods that exists in Lucene are either anonymous classes which
> I cannot browse, or that I have not access to the required information.
> 
> Basically, in a SpanOrQuery, I am not able to find out what matched. Have
> any of you faced that kind of problem, and found out an elegant way to do it
> without having to completely rewrite each getSpans() method for all types of
> queries (this is basically what was done in a previous version of the
> application) ?
> 
> Thanks,
> 
> Cedric
> 
> -- 
> View this message in context: 
http://www.nabble.com/Span-queries-and-complex-scoring-tf4422915.html#a12615745
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org