You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by ilayaraja <il...@gmail.com> on 2018/04/02 06:57:17 UTC

Re: Learning to Rank (LTR) with grouping

Hi Roopa & Deigo,

 I am facing same issue with grouping. Currently, am on Solr 7.2.1 but still
see that grouping with LTR is not working. Did you apply it as patch or the
latest solr version has the fix already?

Ilay



-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by ilayaraja <il...@gmail.com>.

Also, would like to understand what are the ways to optimize for performance
at search time with LTR. Queries with terms (that fetch more results) lead
to very high latency with re-rank query even for reRankDocs=24. 

Is there best practices to reduce the latency?

Can fv cache help?
	<cache name="QUERY_DOC_FV"
      		class="solr.search.LRUCache"
      		size="4096"
      		initialSize="2048"
      		autowarmCount="4096"
      		regenerator="solr.search.NoOpRegenerator" />

Should I increase the cache size?





-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by Diego Ceccarelli <di...@gmail.com>.

Thanks ilayaraja,

I updated the PR today integrating your and Alan's comments. Now it works
also in distributed mode. Please let me know what do you think :)

Cheers
Diego

On Wed, May 2, 2018, 17:46 ilayaraja <il...@gmail.com> wrote:

> Figured out that offset is used as part of the grouping patch which I
> applied
> (SOLR-8776) :
> solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
> +      if (query instanceof AbstractReRankQuery){
> +        topNGroups = cmd.getOffset() +
> ((AbstractReRankQuery)query).getReRankDocs();
> +      } else {
> +        topNGroups = cmd.getOffset() + cmd.getLen();
>
>
>
>
>
>
> -----
> --Ilay
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Learning to Rank (LTR) with grouping

Posted by ilayaraja <il...@gmail.com>.

Figured out that offset is used as part of the grouping patch which I applied
(SOLR-8776) :
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
+      if (query instanceof AbstractReRankQuery){
+        topNGroups = cmd.getOffset() +
((AbstractReRankQuery)query).getReRankDocs();
+      } else {
+        topNGroups = cmd.getOffset() + cmd.getLen();






-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by ilayaraja <il...@gmail.com>.

*
"Top K shouldn't start from the "start" parameter, if it does, it is a bug.
"***

1. I clearly see that LTR do re-rank based on the start parameter.
2. When reRankDocs=24, pageSize=24, I still get the second page of results
re-ranked by ltr plugin when I query with start=24.


Alessandro Benedetti wrote
> Are you using SolrCloud or any distributed search ?
> 
> If you are using just a single Solr instance, LTR should have no problem
> with pagination.
> The re-rank involves the top K and then you paginate.
> So if a document from the original score page 1 ends up in page 3, you
> will
> see it at page three.
> have you verified that : "Say, if an item (Y) from second page is moved to
> first page after 
> re-ranking, while an item (X) from first page is moved away from the first 
> page.  ?" 
> Top K shouldn't start from the "start" parameter, if it does, it is a bug.
> 
> The situation change a little with distributed search where you can
> experiment this behaviour : 
> 
> *Pagination*
> Let’s explore the scenario on a single Solr node and on a sharded
> architecture.
> 
> SINGLE SOLR NODE
> 
> reRankDocs=15
> rows=10
> This means each page is composed by 10 results.
> What happens when we hit the page 2 ?
> The first 5 documents in the search results will have been rescored and
> affected by the reranking.
> The latter 5 documents will preserve the original score and original
> ranking.
> 
> e.g.
> Doc 11 – score= 1.2
> Doc 12 – score= 1.1
> Doc 13 – score= 1.0
> Doc 14 – score= 0.9
> Doc 15 – score= 0.8
> Doc 16 – score= 5.7
> Doc 17 – score= 5.6
> Doc 18 – score= 5.5
> Doc 19 – score= 4.6
> Doc 20 – score= 2.4
> This means that score(15) could be < score(16), but document 15 and 16 are
> still in the expected order.
> The reason is that the top 15 documents are rescored and reranked and the
> rest is left unchanged.
> 
> *SHARDED ARCHITECTURE*
> 
> reRankDocs=15
> rows=10
> Shards number=2
> When looking for the page 2, Solr will trigger queries to she shards to
> collect 2 pages per shard :
> Shard1 : 10 ReRanked docs (page1) + 5 ReRanked docs + 5 OriginalScored
> docs
> (page2)
> Shard2 : 10 ReRanked docs (page1) + 5 ReRanked docs + 5 OriginalScored
> docs
> (page2)
> 
> The the results will be merged, and possibly, original scored search
> results
> can top up reranked docs.
> A possible solution could be to normalise the scores to prevent any
> possibility that a reranked result is surpassed by original scored ones.
> 
> Note: The problem is going to happen after you reach rows * page >
> reRankDocs. In situations when reRankDocs is quite high , the problem will
> occur only in deep paging.
> 
> 
> 
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html





-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by Alessandro Benedetti <a....@sease.io>.

Are you using SolrCloud or any distributed search ?

If you are using just a single Solr instance, LTR should have no problem
with pagination.
The re-rank involves the top K and then you paginate.
So if a document from the original score page 1 ends up in page 3, you will
see it at page three.
have you verified that : "Say, if an item (Y) from second page is moved to
first page after 
re-ranking, while an item (X) from first page is moved away from the first 
page.  ?" 
Top K shouldn't start from the "start" parameter, if it does, it is a bug.

The situation change a little with distributed search where you can
experiment this behaviour : 

*Pagination*
Let’s explore the scenario on a single Solr node and on a sharded
architecture.

SINGLE SOLR NODE

reRankDocs=15
rows=10
This means each page is composed by 10 results.
What happens when we hit the page 2 ?
The first 5 documents in the search results will have been rescored and
affected by the reranking.
The latter 5 documents will preserve the original score and original
ranking.

e.g.
Doc 11 – score= 1.2
Doc 12 – score= 1.1
Doc 13 – score= 1.0
Doc 14 – score= 0.9
Doc 15 – score= 0.8
Doc 16 – score= 5.7
Doc 17 – score= 5.6
Doc 18 – score= 5.5
Doc 19 – score= 4.6
Doc 20 – score= 2.4
This means that score(15) could be < score(16), but document 15 and 16 are
still in the expected order.
The reason is that the top 15 documents are rescored and reranked and the
rest is left unchanged.

*SHARDED ARCHITECTURE*

reRankDocs=15
rows=10
Shards number=2
When looking for the page 2, Solr will trigger queries to she shards to
collect 2 pages per shard :
Shard1 : 10 ReRanked docs (page1) + 5 ReRanked docs + 5 OriginalScored docs
(page2)
Shard2 : 10 ReRanked docs (page1) + 5 ReRanked docs + 5 OriginalScored docs
(page2)

The the results will be merged, and possibly, original scored search results
can top up reranked docs.
A possible solution could be to normalise the scores to prevent any
possibility that a reranked result is surpassed by original scored ones.

Note: The problem is going to happen after you reach rows * page >
reRankDocs. In situations when reRankDocs is quite high , the problem will
occur only in deep paging.



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by ilayaraja <il...@gmail.com>.

Between, I have applied the patch on top of solr 7.2.1 and it worked well for
me though the Test Cases were failing, yet to see why.

On another note, LTR with reRankDocs>page_size seems to create issue. For
example, Say my page_size=24 and reRankDocs=48. 

For first query with start=0, it returns 24 reranked results from top 2
result pages.
Say, if an item (Y) from second page is moved to first page after
re-ranking, while an item (X) from first page is moved away from the first
page. 

For second query with start=24, reRankDocs=48, it returns me second page of
results from results between second and third page that does not have item
X.

So eventually, I do not see item X from first page or next page of results.
Is n't it?

How do we solve this?



-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by Alessandro Benedetti <a....@sease.io>.

Thanks for the response Shawn !

In relation to this : 
"I feel fairly sure that most of them are unwilling to document their
skills.  
If information like that is documented, it might saddle a committer with 
an obligation to work on issues affecting those areas when they may not 
have the free time available to cover that obligation. "

I understand your point.
I was referring to pure Lucene/Solr modules interest/expertise more than
skills but I get that "it might saddle a committer with 
an obligation to work on issues affecting those areas when they may not 
have the free time available to cover that obligation."

It shouldn't transmit an obligation ( as no contributor operates under any
SLA but purely passion driven ) but it might be a "suggestion" .
I was thinking to some way to avoid such long standing Jiras.
Let's pick this issue as an example.
From the little of my opinion I believe it is quite useful.
The last activity is from 22/May/17 15:23 and no committer commented after
that.
I would assume that committers with interest or expertise on Learning To
Rank or Grouping initially didn't have free time to evaluate the patch and
then maybe they just forgot.
Having some sort of tagging based on expertise could at least avoid the
"forget" part ?
Or the contributor should viralize the issue and get as much "votes" from
the community as possible to validate an issue to be sexy ?
Just thinking loudly, it was just an idea ( and I am not completely sure it
could help) but I believe as a community we should manage a little bit
better contributions, of course I am open to any idea and perspective.

Cheers




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/17/2018 5:35 AM, Alessandro Benedetti wrote:
> Apache Lucene/Solr is a big project, is there anywhere in the official
> Apache Lucene/Solr website where each committer list the modules of
> interest/expertise ?

No, there is no repository like that.  Each committer knows what their 
own expertise is of course, and sometimes may know a little bit about 
the expertise of a few others, but there is nothing documented.  I feel 
fairly sure that most of them are unwilling to document their skills.  
If information like that is documented, it might saddle a committer with 
an obligation to work on issues affecting those areas when they may not 
have the free time available to cover that obligation.

> I understand that all of us contributors ( and committers) are just
> volunteers, so no SLA is expected at all, but did the fact of the fixed
> version already assigned affect the address of that Jira issue ?

The fix version is initially assigned by the person who opens the jira.  
When an issue is opened, that field should not be populated, but we 
can't expect everybody to know that.

If one of the committers happens to notice that there is a fix version 
but nobody is actually working on the issue, that may get cleared out.  
A committer will usually only enter one or more values in the fix 
version field if they are reasonably certain that they will actually get 
a fix committed to those specific releases.  For this reason, that field 
is often left blank until the change is actually ready.  Releases are 
not scheduled in advance, so until a release manager has volunteered and 
started work on a release, we never know when it's going to happen.

Thanks,
Shawn

Re: Learning to Rank (LTR) with grouping

Posted by Alessandro Benedetti <a....@sease.io>.

Hi Erick,
I have a curiosity/suggestion regarding how to speed up pending( or
forgotten ) Jiras,
is there a way to find out the most suitable committer(s) for the task and
tag them ?

Apache Lucene/Solr is a big project, is there anywhere in the official
Apache Lucene/Solr website where each committer list the modules of
interest/expertise ?
In this way when a contrbutor create a Jira and attach a patch, the
committers could get a notification if the module involving the Jira is one
of their interest.
This could be done manually ( the contributor check the committers interests
and manually tag them in the Jira) or automatically ( integrating Jira
modules with this "Interests list" in some way) .
Happy to help in this direction.

I understand that all of us contributors ( and committers) are just
volunteers, so no SLA is expected at all, but did the fact of the fixed
version already assigned affect the address of that Jira issue ?


Cheers
 



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by ilayaraja <il...@gmail.com>.

Between, I have applied the patch on top of solr 7.2.1 and it worked well for
me though the Test Cases were failing, yet to see why.

On another note, LTR with reRankDocs>page_size seems to create issue. For
example, Say my page_size=24 and reRankDocs=48. 

For first query with start=0, it returns 24 reranked results from top 2
result pages.
Say, if an item (Y) from second page is moved to first page after
re-ranking, while an item (X) from first page is moved away from the first
page. 

For second query with start=24, reRankDocs=48, it returns me second page of
results from results between second and third page that does not have item
X.

So eventually, I do not see item X from first page or next page of results.
Is n't it?

How do we solve this?



-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by Erick Erickson <er...@gmail.com>.

People sometimes fill in the Fix/Version field when they're creating
the JIRA, since anyone can open a JIRA it's hard to control. I took
that out just now.

Basically if the "Resolution" field doesn't indicate it's fixed, you
should assume that it hasn't been addressed.

Patches welcome.

Best,
Erick

On Tue, Apr 3, 2018 at 9:11 AM, ilayaraja <il...@gmail.com> wrote:
> Thanks Roopa.
>
> I was expecting that the issue has been fixed in solr 7.0 as per here
> https://issues.apache.org/jira/browse/SOLR-8776.
>
> Let me see why it is still not working on solr-ltr-7.2.1
>
>
>
> -----
> --Ilay
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by ilayaraja <il...@gmail.com>.

Thanks Roopa.

I was expecting that the issue has been fixed in solr 7.0 as per here
https://issues.apache.org/jira/browse/SOLR-8776.

Let me see why it is still not working on solr-ltr-7.2.1



-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Learning to Rank (LTR) with grouping

Posted by Roopa Rao <ro...@gmail.com>.

Hi Ilay,

I am still on Solr 6.6.0 and did not patch the grouping fix.
I implemented a temporary workaround solution to have 2 async request from
the web application 1st with grouping 2nd without grouping and merged the
results.
This solution worked for my case as we were getting grouping results for
specific tiles in the page.

Roopa

On Mon, Apr 2, 2018 at 2:57 AM, ilayaraja <il...@gmail.com> wrote:

> Hi Roopa & Deigo,
>
>  I am facing same issue with grouping. Currently, am on Solr 7.2.1 but
> still
> see that grouping with LTR is not working. Did you apply it as patch or the
> latest solr version has the fix already?
>
> Ilay
>
>
>
> -----
> --Ilay
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>