You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nitesh Nandy <ni...@gmail.com> on 2012/06/11 14:29:09 UTC

Issue with field collapsing in solr 4 while performing distributed search

Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices and
2 shards)

The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud

We are doing distributed search. While querying, we use field collapsing
with "ngroups" set as true as we need the number of search results.

However, there is a difference in the number of "result list" returned and
the "ngroups" value returned.

Ex:
http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true


The response XMl looks like

<response>
<script/>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">46</int>
<lst name="params">
<str name="group.field">id</str>
<str name="group.ngroups">true</str>
<str name="group">true</str>
<str name="q">messagebody:monit AND usergroupid:3</str>
</lst>
</lst>
<lst name="grouped">
<lst name="id">
<int name="matches">10</int>
<int name="ngroups">9</int>
<arr name="groups">
<lst>
<str name="groupValue">320043</str>
<result name="doclist" numFound="1" start="0">
<doc>...</doc>
</result>
</lst>
<lst>
<str name="groupValue">398807</str>
<result name="doclist" numFound="5" start="0" maxScore="2.4154348">...
</result>
</lst>
<lst>
<str name="groupValue">346878</str>
<result name="doclist" numFound="2" start="0">...</result>
</lst>
<lst>
<str name="groupValue">346880</str>
<result name="doclist" numFound="2" start="0">...</result>
</lst>
</arr>
</lst>
</lst>
</response>

So you can see that the ngroups value returned is 9 and the actual number
of groups returned is 4

Why do we have this discrepancy in the ngroups, matches and actual number
of groups. Is this an open issue ?

 Any kind of help is appreciated.

-- 
Regards,

Nitesh Nandy

Re: Issue with field collapsing in solr 4 while performing distributed search

Posted by Jack Krupansky <ja...@basetechnology.com>.
Is there a Solr wiki that discusses these issues, such as "Groups can't 
cross shard boundaries"? Seems like it should be highlighted prominently, 
maybe here:
http://wiki.apache.org/solr/FieldCollapsing

Seems like it should be mentioned on the distributed/SolrCloud wiki(s) as 
well.

Is this a "distributed IDF" type of issue or something else? Is this an 
outright bug or an (insurmountable?) limitation?

I did notice SOLR-2066, but didn't see mention of the limitation. Are there 
any other limitations for distributed grouping?

-- Jack Krupansky

-----Original Message----- 
From: Martijn v Groningen
Sent: Monday, June 11, 2012 8:53 AM
To: solr-user@lucene.apache.org
Subject: Re: Issue with field collapsing in solr 4 while performing 
distributed search

The ngroups returns the number of groups that have matched with the
query. However if you want ngroups to be correct in a distributed
environment you need
to put document belonging to the same group into the same shard.
Groups can't cross shard boundaries. I guess you need to do
some manual document partitioning.

Martijn

On 11 June 2012 14:29, Nitesh Nandy <ni...@gmail.com> wrote:
> Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices 
> and
> 2 shards)
>
> The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud
>
> We are doing distributed search. While querying, we use field collapsing
> with "ngroups" set as true as we need the number of search results.
>
> However, there is a difference in the number of "result list" returned and
> the "ngroups" value returned.
>
> Ex:
> http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true
>
>
> The response XMl looks like
>
> <response>
> <script/>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">46</int>
> <lst name="params">
> <str name="group.field">id</str>
> <str name="group.ngroups">true</str>
> <str name="group">true</str>
> <str name="q">messagebody:monit AND usergroupid:3</str>
> </lst>
> </lst>
> <lst name="grouped">
> <lst name="id">
> <int name="matches">10</int>
> <int name="ngroups">9</int>
> <arr name="groups">
> <lst>
> <str name="groupValue">320043</str>
> <result name="doclist" numFound="1" start="0">
> <doc>...</doc>
> </result>
> </lst>
> <lst>
> <str name="groupValue">398807</str>
> <result name="doclist" numFound="5" start="0" maxScore="2.4154348">...
> </result>
> </lst>
> <lst>
> <str name="groupValue">346878</str>
> <result name="doclist" numFound="2" start="0">...</result>
> </lst>
> <lst>
> <str name="groupValue">346880</str>
> <result name="doclist" numFound="2" start="0">...</result>
> </lst>
> </arr>
> </lst>
> </lst>
> </response>
>
> So you can see that the ngroups value returned is 9 and the actual number
> of groups returned is 4
>
> Why do we have this discrepancy in the ngroups, matches and actual number
> of groups. Is this an open issue ?
>
>  Any kind of help is appreciated.
>
> --
> Regards,
>
> Nitesh Nandy



-- 
Met vriendelijke groet,

Martijn van Groningen 


Re: Issue with field collapsing in solr 4 while performing distributed search

Posted by roz dev <ro...@gmail.com>.
I think that there is no way around doing custom logic in this case.

If indexing process knows that documents have to be grouped then they
better be together.

-Saroj


On Mon, Jun 11, 2012 at 6:37 AM, Nitesh Nandy <ni...@gmail.com> wrote:

> Martijn,
>
> How do we add a custom algorithm for distributing documents in Solr Cloud?
> According to this discussion
>
> http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html
>  , Mark discourages users from using custom distribution mechanism in Solr
> Cloud.
>
> Load balancing is not an issue for us at the moment. In that case, how
> should we implement a custom partitioning algorithm.
>
>
> On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen <
> martijn.v.groningen@gmail.com> wrote:
>
> > The ngroups returns the number of groups that have matched with the
> > query. However if you want ngroups to be correct in a distributed
> > environment you need
> > to put document belonging to the same group into the same shard.
> > Groups can't cross shard boundaries. I guess you need to do
> > some manual document partitioning.
> >
> > Martijn
> >
> > On 11 June 2012 14:29, Nitesh Nandy <ni...@gmail.com> wrote:
> > > Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices
> > and
> > > 2 shards)
> > >
> > > The setup was done as per the wiki:
> > http://wiki.apache.org/solr/SolrCloud
> > >
> > > We are doing distributed search. While querying, we use field
> collapsing
> > > with "ngroups" set as true as we need the number of search results.
> > >
> > > However, there is a difference in the number of "result list" returned
> > and
> > > the "ngroups" value returned.
> > >
> > > Ex:
> > >
> >
> http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true
> > >
> > >
> > > The response XMl looks like
> > >
> > > <response>
> > > <script/>
> > > <lst name="responseHeader">
> > > <int name="status">0</int>
> > > <int name="QTime">46</int>
> > > <lst name="params">
> > > <str name="group.field">id</str>
> > > <str name="group.ngroups">true</str>
> > > <str name="group">true</str>
> > > <str name="q">messagebody:monit AND usergroupid:3</str>
> > > </lst>
> > > </lst>
> > > <lst name="grouped">
> > > <lst name="id">
> > > <int name="matches">10</int>
> > > <int name="ngroups">9</int>
> > > <arr name="groups">
> > > <lst>
> > > <str name="groupValue">320043</str>
> > > <result name="doclist" numFound="1" start="0">
> > > <doc>...</doc>
> > > </result>
> > > </lst>
> > > <lst>
> > > <str name="groupValue">398807</str>
> > > <result name="doclist" numFound="5" start="0" maxScore="2.4154348">...
> > > </result>
> > > </lst>
> > > <lst>
> > > <str name="groupValue">346878</str>
> > > <result name="doclist" numFound="2" start="0">...</result>
> > > </lst>
> > > <lst>
> > > <str name="groupValue">346880</str>
> > > <result name="doclist" numFound="2" start="0">...</result>
> > > </lst>
> > > </arr>
> > > </lst>
> > > </lst>
> > > </response>
> > >
> > > So you can see that the ngroups value returned is 9 and the actual
> number
> > > of groups returned is 4
> > >
> > > Why do we have this discrepancy in the ngroups, matches and actual
> number
> > > of groups. Is this an open issue ?
> > >
> > >  Any kind of help is appreciated.
> > >
> > > --
> > > Regards,
> > >
> > > Nitesh Nandy
> >
> >
> >
> > --
> > Met vriendelijke groet,
> >
> > Martijn van Groningen
> >
>
>
>
> --
> Regards,
>
> Nitesh Nandy
>

Re: Issue with field collapsing in solr 4 while performing distributed search

Posted by Nitesh Nandy <ni...@gmail.com>.
Martijn,

How do we add a custom algorithm for distributing documents in Solr Cloud?
According to this discussion
http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html
 , Mark discourages users from using custom distribution mechanism in Solr
Cloud.

Load balancing is not an issue for us at the moment. In that case, how
should we implement a custom partitioning algorithm.


On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen <
martijn.v.groningen@gmail.com> wrote:

> The ngroups returns the number of groups that have matched with the
> query. However if you want ngroups to be correct in a distributed
> environment you need
> to put document belonging to the same group into the same shard.
> Groups can't cross shard boundaries. I guess you need to do
> some manual document partitioning.
>
> Martijn
>
> On 11 June 2012 14:29, Nitesh Nandy <ni...@gmail.com> wrote:
> > Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices
> and
> > 2 shards)
> >
> > The setup was done as per the wiki:
> http://wiki.apache.org/solr/SolrCloud
> >
> > We are doing distributed search. While querying, we use field collapsing
> > with "ngroups" set as true as we need the number of search results.
> >
> > However, there is a difference in the number of "result list" returned
> and
> > the "ngroups" value returned.
> >
> > Ex:
> >
> http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true
> >
> >
> > The response XMl looks like
> >
> > <response>
> > <script/>
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">46</int>
> > <lst name="params">
> > <str name="group.field">id</str>
> > <str name="group.ngroups">true</str>
> > <str name="group">true</str>
> > <str name="q">messagebody:monit AND usergroupid:3</str>
> > </lst>
> > </lst>
> > <lst name="grouped">
> > <lst name="id">
> > <int name="matches">10</int>
> > <int name="ngroups">9</int>
> > <arr name="groups">
> > <lst>
> > <str name="groupValue">320043</str>
> > <result name="doclist" numFound="1" start="0">
> > <doc>...</doc>
> > </result>
> > </lst>
> > <lst>
> > <str name="groupValue">398807</str>
> > <result name="doclist" numFound="5" start="0" maxScore="2.4154348">...
> > </result>
> > </lst>
> > <lst>
> > <str name="groupValue">346878</str>
> > <result name="doclist" numFound="2" start="0">...</result>
> > </lst>
> > <lst>
> > <str name="groupValue">346880</str>
> > <result name="doclist" numFound="2" start="0">...</result>
> > </lst>
> > </arr>
> > </lst>
> > </lst>
> > </response>
> >
> > So you can see that the ngroups value returned is 9 and the actual number
> > of groups returned is 4
> >
> > Why do we have this discrepancy in the ngroups, matches and actual number
> > of groups. Is this an open issue ?
> >
> >  Any kind of help is appreciated.
> >
> > --
> > Regards,
> >
> > Nitesh Nandy
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>



-- 
Regards,

Nitesh Nandy

Re: Issue with field collapsing in solr 4 while performing distributed search

Posted by Martijn v Groningen <ma...@gmail.com>.
The ngroups returns the number of groups that have matched with the
query. However if you want ngroups to be correct in a distributed
environment you need
to put document belonging to the same group into the same shard.
Groups can't cross shard boundaries. I guess you need to do
some manual document partitioning.

Martijn

On 11 June 2012 14:29, Nitesh Nandy <ni...@gmail.com> wrote:
> Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices and
> 2 shards)
>
> The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud
>
> We are doing distributed search. While querying, we use field collapsing
> with "ngroups" set as true as we need the number of search results.
>
> However, there is a difference in the number of "result list" returned and
> the "ngroups" value returned.
>
> Ex:
> http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true
>
>
> The response XMl looks like
>
> <response>
> <script/>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">46</int>
> <lst name="params">
> <str name="group.field">id</str>
> <str name="group.ngroups">true</str>
> <str name="group">true</str>
> <str name="q">messagebody:monit AND usergroupid:3</str>
> </lst>
> </lst>
> <lst name="grouped">
> <lst name="id">
> <int name="matches">10</int>
> <int name="ngroups">9</int>
> <arr name="groups">
> <lst>
> <str name="groupValue">320043</str>
> <result name="doclist" numFound="1" start="0">
> <doc>...</doc>
> </result>
> </lst>
> <lst>
> <str name="groupValue">398807</str>
> <result name="doclist" numFound="5" start="0" maxScore="2.4154348">...
> </result>
> </lst>
> <lst>
> <str name="groupValue">346878</str>
> <result name="doclist" numFound="2" start="0">...</result>
> </lst>
> <lst>
> <str name="groupValue">346880</str>
> <result name="doclist" numFound="2" start="0">...</result>
> </lst>
> </arr>
> </lst>
> </lst>
> </response>
>
> So you can see that the ngroups value returned is 9 and the actual number
> of groups returned is 4
>
> Why do we have this discrepancy in the ngroups, matches and actual number
> of groups. Is this an open issue ?
>
>  Any kind of help is appreciated.
>
> --
> Regards,
>
> Nitesh Nandy



-- 
Met vriendelijke groet,

Martijn van Groningen