You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael Busch (JIRA)" <ji...@apache.org> on 2007/04/17 21:32:15 UTC

[jira] Commented: (LUCENE-834) Payload Queries

    [ https://issues.apache.org/jira/browse/LUCENE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489519 ] 

Michael Busch commented on LUCENE-834:
--------------------------------------

Hi Grant,

cool that you started implementing queries that use of payloads! I have a question about this one: BoostingTermQuery only takes the payload of the first term position into account for scoring. Could you explain why you implemented it this way? Shouldn't we rather compute the average of the payload values of all positions?

> Payload Queries
> ---------------
>
>                 Key: LUCENE-834
>                 URL: https://issues.apache.org/jira/browse/LUCENE-834
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Grant Ingersoll
>         Assigned To: Grant Ingersoll
>            Priority: Minor
>         Attachments: boosting.term.query.patch
>
>
> Now that payloads have been implemented, it will be good to make them searchable via one or more Query mechanisms.  See http://wiki.apache.org/lucene-java/Payload_Planning for some background information and https://issues.apache.org/jira/browse/LUCENE-755 for the issue that started it all.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-834) Payload Queries

Posted by Grant Ingersoll <gs...@apache.org>.
OK, I see it now.  I was thinking I was scoring that individual term  
at that position, but if I had read the Scorer documentation better  
instead of assuming I knew what it meant I wouldn't have had this  
problem, as it clearly says it advances the document.

I will try to fix it by the end of this weekend.

Thanks for the review.
-Grant

On Apr 19, 2007, at 2:27 PM, Michael Busch wrote:

> Grant Ingersoll wrote:
>> OK, I need to take a step back, Michael, b/c I thought I  
>> understood your original comment, but I went to make the change  
>> and am no longer sure.
>>
>>   By "first term position" are you referring to multiple terms per  
>> position or do you mean the same term in different positions?   
>> When I implemented the BTQ (BoostingTermQuery) I modeled it pretty  
>> much off of the SpanTermQuery (STQ) which I felt had very similar  
>> functionality, other than having to load the payload.
>>
>> Doesn't the next() method on the BoostingSpanScorer take care of  
>> moving through the various positions that the term appears at,  
>> whereupon it loads the payload at the position?  Could you write  
>> up a patch to the test to demonstrate?
>>
>> Thanks,
>> Grant
>>
>
> Grant,
>
> I mean the case when the same term has multiple positions in a  
> document. In BoostingSpanScorer.next() you call super.next() (from  
> SpanScorer), which calls SpanScorer.setFreqCurrentDoc(). This  
> method iterates through all spans for the current doc via  
> TermSpans.next(). So TermSpans.next() is the actual method which  
> calls TermPositions.nextPosition(). This means when SpanScorer.next 
> () returns it has iterated through all positions of that doc  
> already. Then you load the payload, which means that you only get  
> the payload of the first term position of the next (the wrong!)  
> document in the term's posting list.
>
> Your testcase does not show this behavior, because the term you  
> search for only appears once at most in each document. And since  
> all payloads of the term you search for have the same value, the  
> testcase doesn't fail even though it loads the payloads of the  
> wrong documents for scoring.
>
> - Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-834) Payload Queries

Posted by Michael Busch <bu...@gmail.com>.
Grant Ingersoll wrote:
> OK, I need to take a step back, Michael, b/c I thought I understood 
> your original comment, but I went to make the change and am no longer 
> sure.
>
>   By "first term position" are you referring to multiple terms per 
> position or do you mean the same term in different positions?  When I 
> implemented the BTQ (BoostingTermQuery) I modeled it pretty much off 
> of the SpanTermQuery (STQ) which I felt had very similar 
> functionality, other than having to load the payload.
>
> Doesn't the next() method on the BoostingSpanScorer take care of 
> moving through the various positions that the term appears at, 
> whereupon it loads the payload at the position?  Could you write up a 
> patch to the test to demonstrate?
>
> Thanks,
> Grant
>

Grant,

I mean the case when the same term has multiple positions in a document. 
In BoostingSpanScorer.next() you call super.next() (from SpanScorer), 
which calls SpanScorer.setFreqCurrentDoc(). This method iterates through 
all spans for the current doc via TermSpans.next(). So TermSpans.next() 
is the actual method which calls TermPositions.nextPosition(). This 
means when SpanScorer.next() returns it has iterated through all 
positions of that doc already. Then you load the payload, which means 
that you only get the payload of the first term position of the next 
(the wrong!) document in the term's posting list.

Your testcase does not show this behavior, because the term you search 
for only appears once at most in each document. And since all payloads 
of the term you search for have the same value, the testcase doesn't 
fail even though it loads the payloads of the wrong documents for scoring.

- Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-834) Payload Queries

Posted by Grant Ingersoll <gs...@apache.org>.
OK, I need to take a step back, Michael, b/c I thought I understood  
your original comment, but I went to make the change and am no longer  
sure.

   By "first term position" are you referring to multiple terms per  
position or do you mean the same term in different positions?  When I  
implemented the BTQ (BoostingTermQuery) I modeled it pretty much off  
of the SpanTermQuery (STQ) which I felt had very similar  
functionality, other than having to load the payload.

Doesn't the next() method on the BoostingSpanScorer take care of  
moving through the various positions that the term appears at,  
whereupon it loads the payload at the position?  Could you write up a  
patch to the test to demonstrate?

Thanks,
Grant


On Apr 17, 2007, at 3:32 PM, Michael Busch (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/LUCENE-834? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12489519 ]
>
> Michael Busch commented on LUCENE-834:
> --------------------------------------
>
> Hi Grant,
>
> cool that you started implementing queries that use of payloads! I  
> have a question about this one: BoostingTermQuery only takes the  
> payload of the first term position into account for scoring. Could  
> you explain why you implemented it this way? Shouldn't we rather  
> compute the average of the payload values of all positions?
>
>> Payload Queries
>> ---------------
>>
>>                 Key: LUCENE-834
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-834
>>             Project: Lucene - Java
>>          Issue Type: New Feature
>>          Components: Search
>>            Reporter: Grant Ingersoll
>>         Assigned To: Grant Ingersoll
>>            Priority: Minor
>>         Attachments: boosting.term.query.patch
>>
>>
>> Now that payloads have been implemented, it will be good to make  
>> them searchable via one or more Query mechanisms.  See http:// 
>> wiki.apache.org/lucene-java/Payload_Planning for some background  
>> information and https://issues.apache.org/jira/browse/LUCENE-755  
>> for the issue that started it all.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org