You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by anand chandak <an...@oracle.com> on 2014/02/05 13:56:47 UTC

Join Scoring

Hi


Why doesn't the solr join query doesn't return the score when returning 
the response. Although I see JoinScorer in the  JoinQParserPlugin class ?


Also, to evaluate the join performance, I filed a join query aganist 
solr's join - JoinQParserPlugin and aganist lucene 
JoinUtil.createJoinQuery. I always find the Qtime for Solr much lower 
then lucenes. What is the reason behind this ? Are there some caches 
coming in play or its just the way the join is performed ? If somebody 
can throw some ligh

Thanks,

Anand

Re: Join Scoring

Posted by anand chandak <an...@oracle.com>.

Thanks Mike, that surely helps to clarify the difference.

On the related note, if we have provide a  scoring support for solr 
join, instead of using lucene join, what would be best way to do that . 
There's one suggestion that david gave below :- build a custom QParser 
and call Lucene's JOIN (JoinUtil.JoinQuery), any other possiblity like 
modifying the JoinQueryParser and building the scoring support natively 
? Would u recommend doing that ? If yes, can u provide some high level 
pointers.


Anand.


On 2/13/2014 5:07 PM, Michael McCandless wrote:
> I suspect (not certain) one reason for the performance difference with
> Solr vs Lucene joins is that Solr operates on a top-level reader?
>
> This results in fast joins, but it means whenever you open a new
> reader (NRT reader) there is a high cost to regenerate the top-level
> data structures.
>
> But if the app doesn't open NRT readers, or opens them rarely, perhaps
> that cost is a good tradeoff to get faster joins.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Feb 13, 2014 at 12:10 AM, anand chandak
> <an...@oracle.com> wrote:
>> Re-posting...
>>
>>
>>
>> Thanks,
>>
>> Anand
>>
>>
>>
>> On 2/12/2014 10:55 AM, anand chandak wrote:
>>> Thanks David, really helpful response.
>>>
>>> You mentioned that if we have to add scoring support in solr then a
>>> possible approach would be to add a custom QueryParser, which might be
>>> taking Lucene's JOIN module.  I have tired this approach and this makes it
>>> slow, because I believe this is making more searches..
>>>
>>> Curious, if it is possible instead to enhance existing solr's
>>> JoinQParserPlugin and add the the scoring support in the same class ? Do you
>>> think its feasible and recommended ? If yes, what would it take (highlevel)
>>> - in terms of code changes, any pointers ?
>>>
>>>
>>> Thanks,
>>>
>>> Anand
>>>
>>>
>>> On 2/12/2014 10:31 AM, David Smiley (@MITRE.org) wrote:
>>>> Hi Anand.
>>>>
>>>> Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
>>>> more memory efficient (particularly the worse-case memory use) to
>>>> implement
>>>> the JOIN query without scoring, so that's why.  Of course, you might want
>>>> it
>>>> to score and pay whatever penalty is involved.  For that you'll need to
>>>> write a Solr "QueryParser" that might use Lucene's "join" module which
>>>> has
>>>> scoring variants.  I've taken this approach before.  You asked a specific
>>>> question about the purpose of JoinScorer when it doesn't actually score.
>>>> Lucene's "Query" produces a "Weight" which in turn produces a "Scorer"
>>>> that
>>>> is a DocIdSetIterator plus it returns a score.  So Queries have to have a
>>>> Scorer to match any document even if the score is always 1.
>>>>
>>>> Solr does indeed have a lot of caching; that may be in play here when
>>>> comparing against a quick attempt at using Lucene directly.  In
>>>> particular,
>>>> the matching documents are likely to end up in Solr's DocumentCache.
>>>> Returning stored fields that come back in search results are one of the
>>>> more
>>>> expensive things Lucene/Solr does.
>>>>
>>>> I also think you noted that the fields on documents from the "from" side
>>>> of
>>>> the query are not available to be returned in search results, just the
>>>> "to"
>>>> side.  Yup; that's true.  To remedy this, you might write a Solr
>>>> SearchComponent that adds fields from the "from" side.  That could be
>>>> tricky
>>>> to do; it would probably need to re-run the from-side query but filtered
>>>> to
>>>> the matching top-N documents being returned.
>>>>
>>>> ~ David
>>>>
>>>>
>>>> anand chandak wrote
>>>>> Resending, if somebody can please respond.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Anand
>>>>>
>>>>>
>>>>> On 2/5/2014 6:26 PM, anand chandak wrote:
>>>>> Hi,
>>>>>
>>>>> Having a question on join score, why doesn't the solr join query return
>>>>> the scores. Looking at the code, I see there's JoinScorer defined in
>>>>> the  JoinQParserPlugin class ? If its not used for scoring ? where is it
>>>>> actually used.
>>>>>
>>>>> Also, to evaluate the performance of solr join plugin vs lucene
>>>>> joinutil, I filed same join query against same data-set and same schema
>>>>> and in the results, I am always seeing the Qtime for Solr much lower
>>>>> then lucenes. What is the reason behind this ?  Solr doesn't return
>>>>> scores could that cause so much difference ?
>>>>>
>>>>> My guess is solr has very sophisticated caching mechanism and that might
>>>>> be coming in play, is that true ? or there's difference in the way JOIN
>>>>> happens in the 2 approach.
>>>>>
>>>>> If I understand correctly both the implementation are using 2 pass
>>>>> approach - first all the terms from fromField and then returns all
>>>>> documents that have matching terms in a toField
>>>>>
>>>>> If somebody can throw some light, would highly appreciate.
>>>>>
>>>>> Thanks,
>>>>> Anand
>>>>
>>>>
>>>> -----
>>>>    Author:
>>>> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>>>> --
>>>> View this message in context:
>>>> http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>

Re: Join Scoring

Posted by Michael McCandless <lu...@mikemccandless.com>.

I suspect (not certain) one reason for the performance difference with
Solr vs Lucene joins is that Solr operates on a top-level reader?

This results in fast joins, but it means whenever you open a new
reader (NRT reader) there is a high cost to regenerate the top-level
data structures.

But if the app doesn't open NRT readers, or opens them rarely, perhaps
that cost is a good tradeoff to get faster joins.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 13, 2014 at 12:10 AM, anand chandak
<an...@oracle.com> wrote:
> Re-posting...
>
>
>
> Thanks,
>
> Anand
>
>
>
> On 2/12/2014 10:55 AM, anand chandak wrote:
>>
>> Thanks David, really helpful response.
>>
>> You mentioned that if we have to add scoring support in solr then a
>> possible approach would be to add a custom QueryParser, which might be
>> taking Lucene's JOIN module.  I have tired this approach and this makes it
>> slow, because I believe this is making more searches..
>>
>> Curious, if it is possible instead to enhance existing solr's
>> JoinQParserPlugin and add the the scoring support in the same class ? Do you
>> think its feasible and recommended ? If yes, what would it take (highlevel)
>> - in terms of code changes, any pointers ?
>>
>>
>> Thanks,
>>
>> Anand
>>
>>
>> On 2/12/2014 10:31 AM, David Smiley (@MITRE.org) wrote:
>>>
>>> Hi Anand.
>>>
>>> Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
>>> more memory efficient (particularly the worse-case memory use) to
>>> implement
>>> the JOIN query without scoring, so that's why.  Of course, you might want
>>> it
>>> to score and pay whatever penalty is involved.  For that you'll need to
>>> write a Solr "QueryParser" that might use Lucene's "join" module which
>>> has
>>> scoring variants.  I've taken this approach before.  You asked a specific
>>> question about the purpose of JoinScorer when it doesn't actually score.
>>> Lucene's "Query" produces a "Weight" which in turn produces a "Scorer"
>>> that
>>> is a DocIdSetIterator plus it returns a score.  So Queries have to have a
>>> Scorer to match any document even if the score is always 1.
>>>
>>> Solr does indeed have a lot of caching; that may be in play here when
>>> comparing against a quick attempt at using Lucene directly.  In
>>> particular,
>>> the matching documents are likely to end up in Solr's DocumentCache.
>>> Returning stored fields that come back in search results are one of the
>>> more
>>> expensive things Lucene/Solr does.
>>>
>>> I also think you noted that the fields on documents from the "from" side
>>> of
>>> the query are not available to be returned in search results, just the
>>> "to"
>>> side.  Yup; that's true.  To remedy this, you might write a Solr
>>> SearchComponent that adds fields from the "from" side.  That could be
>>> tricky
>>> to do; it would probably need to re-run the from-side query but filtered
>>> to
>>> the matching top-N documents being returned.
>>>
>>> ~ David
>>>
>>>
>>> anand chandak wrote
>>>>
>>>> Resending, if somebody can please respond.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Anand
>>>>
>>>>
>>>> On 2/5/2014 6:26 PM, anand chandak wrote:
>>>> Hi,
>>>>
>>>> Having a question on join score, why doesn't the solr join query return
>>>> the scores. Looking at the code, I see there's JoinScorer defined in
>>>> the  JoinQParserPlugin class ? If its not used for scoring ? where is it
>>>> actually used.
>>>>
>>>> Also, to evaluate the performance of solr join plugin vs lucene
>>>> joinutil, I filed same join query against same data-set and same schema
>>>> and in the results, I am always seeing the Qtime for Solr much lower
>>>> then lucenes. What is the reason behind this ?  Solr doesn't return
>>>> scores could that cause so much difference ?
>>>>
>>>> My guess is solr has very sophisticated caching mechanism and that might
>>>> be coming in play, is that true ? or there's difference in the way JOIN
>>>> happens in the 2 approach.
>>>>
>>>> If I understand correctly both the implementation are using 2 pass
>>>> approach - first all the terms from fromField and then returns all
>>>> documents that have matching terms in a toField
>>>>
>>>> If somebody can throw some light, would highly appreciate.
>>>>
>>>> Thanks,
>>>> Anand
>>>
>>>
>>>
>>>
>>> -----
>>>   Author:
>>> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>

Re: Join Scoring

Posted by anand chandak <an...@oracle.com>.

Re-posting...



Thanks,

Anand


On 2/12/2014 10:55 AM, anand chandak wrote:
> Thanks David, really helpful response.
>
> You mentioned that if we have to add scoring support in solr then a 
> possible approach would be to add a custom QueryParser, which might be 
> taking Lucene's JOIN module.  I have tired this approach and this 
> makes it slow, because I believe this is making more searches..
>
> Curious, if it is possible instead to enhance existing solr's 
> JoinQParserPlugin and add the the scoring support in the same class ? 
> Do you think its feasible and recommended ? If yes, what would it take 
> (highlevel) - in terms of code changes, any pointers ?
>
> Thanks,
>
> Anand
>
>
> On 2/12/2014 10:31 AM, David Smiley (@MITRE.org) wrote:
>> Hi Anand.
>>
>> Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster 
>> and
>> more memory efficient (particularly the worse-case memory use) to 
>> implement
>> the JOIN query without scoring, so that's why.  Of course, you might 
>> want it
>> to score and pay whatever penalty is involved.  For that you'll need to
>> write a Solr "QueryParser" that might use Lucene's "join" module 
>> which has
>> scoring variants.  I've taken this approach before.  You asked a 
>> specific
>> question about the purpose of JoinScorer when it doesn't actually score.
>> Lucene's "Query" produces a "Weight" which in turn produces a 
>> "Scorer" that
>> is a DocIdSetIterator plus it returns a score.  So Queries have to 
>> have a
>> Scorer to match any document even if the score is always 1.
>>
>> Solr does indeed have a lot of caching; that may be in play here when
>> comparing against a quick attempt at using Lucene directly.  In 
>> particular,
>> the matching documents are likely to end up in Solr's DocumentCache.
>> Returning stored fields that come back in search results are one of 
>> the more
>> expensive things Lucene/Solr does.
>>
>> I also think you noted that the fields on documents from the "from" 
>> side of
>> the query are not available to be returned in search results, just 
>> the "to"
>> side.  Yup; that's true.  To remedy this, you might write a Solr
>> SearchComponent that adds fields from the "from" side.  That could be 
>> tricky
>> to do; it would probably need to re-run the from-side query but 
>> filtered to
>> the matching top-N documents being returned.
>>
>> ~ David
>>
>>
>> anand chandak wrote
>>> Resending, if somebody can please respond.
>>>
>>>
>>> Thanks,
>>>
>>> Anand
>>>
>>>
>>> On 2/5/2014 6:26 PM, anand chandak wrote:
>>> Hi,
>>>
>>> Having a question on join score, why doesn't the solr join query return
>>> the scores. Looking at the code, I see there's JoinScorer defined in
>>> the  JoinQParserPlugin class ? If its not used for scoring ? where 
>>> is it
>>> actually used.
>>>
>>> Also, to evaluate the performance of solr join plugin vs lucene
>>> joinutil, I filed same join query against same data-set and same schema
>>> and in the results, I am always seeing the Qtime for Solr much lower
>>> then lucenes. What is the reason behind this ?  Solr doesn't return
>>> scores could that cause so much difference ?
>>>
>>> My guess is solr has very sophisticated caching mechanism and that 
>>> might
>>> be coming in play, is that true ? or there's difference in the way JOIN
>>> happens in the 2 approach.
>>>
>>> If I understand correctly both the implementation are using 2 pass
>>> approach - first all the terms from fromField and then returns all
>>> documents that have matching terms in a toField
>>>
>>> If somebody can throw some light, would highly appreciate.
>>>
>>> Thanks,
>>> Anand
>>
>>
>>
>> -----
>>   Author: 
>> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>> -- 
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Join Scoring

Posted by anand chandak <an...@oracle.com>.

Thanks David, really helpful response.


You mentioned that if we have to add scoring support in solr then a 
possible approach would be to add a custom QueryParser, which might be 
taking Lucene's JOIN module.


Curious, if it is possible instead to enhance existing solr's 
JoinQParserPlugin and add the the scoring support in the same class ? Do 
you think its feasible and recommended ? If yes, what would it take - in 
terms of code changes, any pointers ?

Thanks,

Anand


On 2/12/2014 10:31 AM, David Smiley (@MITRE.org) wrote:
> Hi Anand.
>
> Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
> more memory efficient (particularly the worse-case memory use) to implement
> the JOIN query without scoring, so that's why.  Of course, you might want it
> to score and pay whatever penalty is involved.  For that you'll need to
> write a Solr "QueryParser" that might use Lucene's "join" module which has
> scoring variants.  I've taken this approach before.  You asked a specific
> question about the purpose of JoinScorer when it doesn't actually score.
> Lucene's "Query" produces a "Weight" which in turn produces a "Scorer" that
> is a DocIdSetIterator plus it returns a score.  So Queries have to have a
> Scorer to match any document even if the score is always 1.
>
> Solr does indeed have a lot of caching; that may be in play here when
> comparing against a quick attempt at using Lucene directly.  In particular,
> the matching documents are likely to end up in Solr's DocumentCache.
> Returning stored fields that come back in search results are one of the more
> expensive things Lucene/Solr does.
>
> I also think you noted that the fields on documents from the "from" side of
> the query are not available to be returned in search results, just the "to"
> side.  Yup; that's true.  To remedy this, you might write a Solr
> SearchComponent that adds fields from the "from" side.  That could be tricky
> to do; it would probably need to re-run the from-side query but filtered to
> the matching top-N documents being returned.
>
> ~ David
>
>
> anand chandak wrote
>> Resending, if somebody can please respond.
>>
>>
>> Thanks,
>>
>> Anand
>>
>>
>> On 2/5/2014 6:26 PM, anand chandak wrote:
>> Hi,
>>
>> Having a question on join score, why doesn't the solr join query return
>> the scores. Looking at the code, I see there's JoinScorer defined in
>> the  JoinQParserPlugin class ? If its not used for scoring ? where is it
>> actually used.
>>
>> Also, to evaluate the performance of solr join plugin vs lucene
>> joinutil, I filed same join query against same data-set and same schema
>> and in the results, I am always seeing the Qtime for Solr much lower
>> then lucenes. What is the reason behind this ?  Solr doesn't return
>> scores could that cause so much difference ?
>>
>> My guess is solr has very sophisticated caching mechanism and that might
>> be coming in play, is that true ? or there's difference in the way JOIN
>> happens in the 2 approach.
>>
>> If I understand correctly both the implementation are using 2 pass
>> approach - first all the terms from fromField and then returns all
>> documents that have matching terms in a toField
>>
>> If somebody can throw some light, would highly appreciate.
>>
>> Thanks,
>> Anand
>
>
>
> -----
>   Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Join Scoring

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

Hi Anand.

Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
more memory efficient (particularly the worse-case memory use) to implement
the JOIN query without scoring, so that's why.  Of course, you might want it
to score and pay whatever penalty is involved.  For that you'll need to
write a Solr "QueryParser" that might use Lucene's "join" module which has
scoring variants.  I've taken this approach before.  You asked a specific
question about the purpose of JoinScorer when it doesn't actually score. 
Lucene's "Query" produces a "Weight" which in turn produces a "Scorer" that
is a DocIdSetIterator plus it returns a score.  So Queries have to have a
Scorer to match any document even if the score is always 1.

Solr does indeed have a lot of caching; that may be in play here when
comparing against a quick attempt at using Lucene directly.  In particular,
the matching documents are likely to end up in Solr's DocumentCache. 
Returning stored fields that come back in search results are one of the more
expensive things Lucene/Solr does.

I also think you noted that the fields on documents from the "from" side of
the query are not available to be returned in search results, just the "to"
side.  Yup; that's true.  To remedy this, you might write a Solr
SearchComponent that adds fields from the "from" side.  That could be tricky
to do; it would probably need to re-run the from-side query but filtered to
the matching top-N documents being returned.

~ David

anand chandak wrote
> Resending, if somebody can please respond.
> 
> 
> Thanks,
> 
> Anand
> 
> 
> On 2/5/2014 6:26 PM, anand chandak wrote:
> Hi,
> 
> Having a question on join score, why doesn't the solr join query return 
> the scores. Looking at the code, I see there's JoinScorer defined in 
> the  JoinQParserPlugin class ? If its not used for scoring ? where is it 
> actually used.
> 
> Also, to evaluate the performance of solr join plugin vs lucene 
> joinutil, I filed same join query against same data-set and same schema 
> and in the results, I am always seeing the Qtime for Solr much lower 
> then lucenes. What is the reason behind this ?  Solr doesn't return 
> scores could that cause so much difference ?
> 
> My guess is solr has very sophisticated caching mechanism and that might 
> be coming in play, is that true ? or there's difference in the way JOIN 
> happens in the 2 approach.
> 
> If I understand correctly both the implementation are using 2 pass 
> approach - first all the terms from fromField and then returns all 
> documents that have matching terms in a toField
> 
> If somebody can throw some light, would highly appreciate.
> 
> Thanks,
> Anand

-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Join Scoring

Posted by anand chandak <an...@oracle.com>.

Resending, if somebody can please respond.


Thanks,

Anand


On 2/5/2014 6:26 PM, anand chandak wrote:
Hi,

Having a question on join score, why doesn't the solr join query return 
the scores. Looking at the code, I see there's JoinScorer defined in 
the  JoinQParserPlugin class ? If its not used for scoring ? where is it 
actually used.

Also, to evaluate the performance of solr join plugin vs lucene 
joinutil, I filed same join query against same data-set and same schema 
and in the results, I am always seeing the Qtime for Solr much lower 
then lucenes. What is the reason behind this ?  Solr doesn't return 
scores could that cause so much difference ?

My guess is solr has very sophisticated caching mechanism and that might 
be coming in play, is that true ? or there's difference in the way JOIN 
happens in the 2 approach.

If I understand correctly both the implementation are using 2 pass 
approach - first all the terms from fromField and then returns all 
documents that have matching terms in a toField

If somebody can throw some light, would highly appreciate.

Thanks,
Anand