You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Young, Cody" <Co...@move.com> on 2012/03/31 01:34:53 UTC

Distributed grouping issue

Hi All,

I'm having an issue getting distributed grouping  working on trunk (Mar 29, 2012).

If I send this query:

http://localhost:8086/solr/core0/select/?q=*:*&group=false &shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I get 260,000 results. As soon as I change to using grouping:

http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I only get 32,000 results. (the number of documents in a single core.)

The field that I am grouping on is defined as:

<field name="group_field" type="string" indexed="true" stored="true" multiValued="false" />

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

The document id:


<field name="document_id" type="string" indexed="true" stored="true" required="true" />

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

<uniqueKey>document_id</uniqueKey>

Anyone else experiencing this? Any ideas?

Thanks,
Cody

RE: Distributed grouping issue

Posted by "Young, Cody" <Co...@move.com>.
Hi Martijn,

I created a JIRA issue and attached a test that fails. It seems to exhibit the same issue that I see on my local box. (If you run it multiple times you can see that the group value of the top doc changes between runs.)

Also, I had to change add fixShardCount = true; in the constructor of the TestDistributedGrouping class, which caused another test case to fail. (It's commented out in the patch with a TODO above it.)

Please let me know if you need any other information.

https://issues.apache.org/jira/browse/SOLR-3316

Thanks!!
Cody

-----Original Message-----
From: martijn.is.hier@gmail.com [mailto:martijn.is.hier@gmail.com] On Behalf Of Martijn v Groningen
Sent: Monday, April 02, 2012 10:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

I tried the to reproduce this. However the matches always returns 4 in my case (when using rows=1 and rows=2).
In your case the 2 documents on each core do belong to the same group, right?

I did find something else. If I use rows=0 then an error occurs. I think we need to further investigate this.
Can you open an issue in Jira? I'm a bit busy today. We can then further look into this in the coming days.

Martijn

On 2 April 2012 23:00, Young, Cody <Co...@move.com> wrote:

> Okay, I've played with this a bit more. Found something interesting:
>
> When the groups returned do not include results from a core, then the 
> core is excluded from the count. (I have 1 group, 2 documents per 
> core)
>
> Example:
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/s
> olr/core0,localhost:8983/solr/core1&group=true&group.field=group_field
> &group.limit=10&rows=1
>
> <lst name="grouped">
> <lst name="group_field">
> <int name="matches">2</int>
>
> Then, just by changing rows=2
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/s
> olr/core0,localhost:8983/solr/core1&group=true&group.field=group_field
> &group.limit=10&rows=2
>
> <lst name="grouped">
> <lst name="group_field">
> <int name="matches">4</int>
>
> Let me know if you have any luck reproducing.
>
> Thanks,
> Cody
>
> -----Original Message-----
> From: martijn.is.hier@gmail.com [mailto:martijn.is.hier@gmail.com] On 
> Behalf Of Martijn v Groningen
> Sent: Monday, April 02, 2012 1:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Distributed grouping issue
>
> >
> > All documents of a group exist on a single shard, there are no 
> > cross-shard groups.
> >
> You only have to partition documents by group when the groupCount and 
> some other features need to be accurate. For the "matches" this is not 
> necessary. The matches are summed up during merging the shared responses.
>
> I can't reproduce the error you are describing on a small local setup 
> I have here. I have two Solr cores with a simple schema. Each core has 
> 3 documents. When grouping the matches element returns 6. I'm running 
> on a trunk that I have updated 30 minutes ago. Can you try to isolate 
> the problem by testing with a small subset of your data?
>
> Martijn
>



--
Met vriendelijke groet,

Martijn van Groningen

Re: Distributed grouping issue

Posted by Martijn v Groningen <ma...@gmail.com>.
I tried the to reproduce this. However the matches always returns 4 in my
case (when using rows=1 and rows=2).
In your case the 2 documents on each core do belong to the same group,
right?

I did find something else. If I use rows=0 then an error occurs. I think we
need to further investigate this.
Can you open an issue in Jira? I'm a bit busy today. We can then further
look into this in the coming days.

Martijn

On 2 April 2012 23:00, Young, Cody <Co...@move.com> wrote:

> Okay, I've played with this a bit more. Found something interesting:
>
> When the groups returned do not include results from a core, then the core
> is excluded from the count. (I have 1 group, 2 documents per core)
>
> Example:
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=1
>
> <lst name="grouped">
> <lst name="group_field">
> <int name="matches">2</int>
>
> Then, just by changing rows=2
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=2
>
> <lst name="grouped">
> <lst name="group_field">
> <int name="matches">4</int>
>
> Let me know if you have any luck reproducing.
>
> Thanks,
> Cody
>
> -----Original Message-----
> From: martijn.is.hier@gmail.com [mailto:martijn.is.hier@gmail.com] On
> Behalf Of Martijn v Groningen
> Sent: Monday, April 02, 2012 1:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Distributed grouping issue
>
> >
> > All documents of a group exist on a single shard, there are no
> > cross-shard groups.
> >
> You only have to partition documents by group when the groupCount and some
> other features need to be accurate. For the "matches" this is not
> necessary. The matches are summed up during merging the shared responses.
>
> I can't reproduce the error you are describing on a small local setup I
> have here. I have two Solr cores with a simple schema. Each core has 3
> documents. When grouping the matches element returns 6. I'm running on a
> trunk that I have updated 30 minutes ago. Can you try to isolate the
> problem by testing with a small subset of your data?
>
> Martijn
>



-- 
Met vriendelijke groet,

Martijn van Groningen

RE: Distributed grouping issue

Posted by "Young, Cody" <Co...@move.com>.
Okay, I've played with this a bit more. Found something interesting:

When the groups returned do not include results from a core, then the core is excluded from the count. (I have 1 group, 2 documents per core)

Example:

http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=1

<lst name="grouped">
<lst name="group_field">
<int name="matches">2</int>

Then, just by changing rows=2

http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=2

<lst name="grouped">
<lst name="group_field">
<int name="matches">4</int>

Let me know if you have any luck reproducing.

Thanks,
Cody 

-----Original Message-----
From: martijn.is.hier@gmail.com [mailto:martijn.is.hier@gmail.com] On Behalf Of Martijn v Groningen
Sent: Monday, April 02, 2012 1:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

>
> All documents of a group exist on a single shard, there are no 
> cross-shard groups.
>
You only have to partition documents by group when the groupCount and some other features need to be accurate. For the "matches" this is not necessary. The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data?

Martijn

Re: Distributed grouping issue

Posted by Martijn v Groningen <ma...@gmail.com>.
>
> All documents of a group exist on a single shard, there are no cross-shard
> groups.
>
You only have to partition documents by group when the groupCount and some
other features need to be accurate. For the "matches" this is not
necessary. The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I
have here. I have two Solr cores with a simple schema. Each core has 3
documents. When grouping the matches element returns 6. I'm running on a
trunk that I have updated 30 minutes ago. Can you try to isolate the
problem by testing with a small subset of your data?

Martijn

RE: Distributed grouping issue

Posted by "Young, Cody" <Co...@move.com>.
In the case of group=false:

numFound="260000"

In the case of group=true:

<int name="matches">34000</int>

As a note, the grouped number changes when I hit refresh. It seems to display the count from any single shard. (The top match also changes).

I haven't tried this in other versions of solr.

All documents of a group exist on a single shard, there are no cross-shard groups.

Thanks,
Cody 

-----Original Message-----
From: martijn.is.hier@gmail.com [mailto:martijn.is.hier@gmail.com] On Behalf Of Martijn v Groningen
Sent: Monday, April 02, 2012 3:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

The "matches" element in the response should return the number of documents that matched with the query and not the number of groups.
Did you encountered this issue also with other Solr versions (3.5 or another nightly build)?

Martijn

On 2 April 2012 09:41, fbrisbart <fb...@bestofmedia.com> wrote:

> Hi,
>
> when you write "I get xxx results", does it come from 'numFound' ? Or 
> you really display xxx results ?
> When using both field collapsing and sharding, the 'numFound' may be 
> wrong. In that case, think about using 'shards.rows' parameter with a 
> high value (be careful, it's bad for performance).
>
> If the problem is really about the returned results, it may be because 
> of several documents having the same unique key "document_id" in 
> different shards.
>
> Hope it helps,
> Franck
>
>
>
> Le vendredi 30 mars 2012 à 23:52 +0000, Young, Cody a écrit :
> > I forgot to mention, I can see the distributed requests happening in 
> > the
> logs:
> >
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core2] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core2&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core4] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core1] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core1&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core3] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core3&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core0] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core0&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core6] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=0
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core7] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core7&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=3
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core5] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core5&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core4] webapp=/solr path=/select
> params={distrib=false&group.distributed.second=true&wt=javabin&version
> =2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.grou
> p_field=4608765424&group.topgroups.group_field=3524954944&group.topgro
> ups.group_field=4182445488&group.topgroups.group_field=4213143392&grou
> p.topgroups.group_field=4328299312&group.topgroups.group_field=4206259
> 648&group.topgroups.group_field=3465497912&group.topgroups.group_field
> =3554417600&group.topgroups.group_field=3140802904&fl=document_id,scor
> e&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&
> group.field=group_field&group=true&isShard=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core6] webapp=/solr path=/select
> params={distrib=false&group.distributed.second=true&wt=javabin&version
> =2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.grou
> p_field=4608765424&group.topgroups.group_field=3524954944&group.topgro
> ups.group_field=4182445488&group.topgroups.group_field=4213143392&grou
> p.topgroups.group_field=4328299312&group.topgroups.group_field=4206259
> 648&group.topgroups.group_field=3465497912&group.topgroups.group_field
> =3554417600&group.topgroups.group_field=3140802904&fl=document_id,scor
> e&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&
> group.field=group_field&group=true&isShard=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core4] webapp=/solr path=/select
> params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182
> 445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-
> 531590393,4183765296-514134497,4206259648-530219973,3465497912-5349559
> 57,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:
> *&distrib=false&group.field=group_field&wt=javabin&isShard=true&versio
> n=2&rows=10}
> status=0 QTime=5
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core0] webapp=/solr path=/select/
> params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,loc
> alhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/c
> ore4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:808
> 6/solr/core7&q=*:*&group.field=group_field&group=true}
> status=0 QTime=106
> >
> > -----Original Message-----
> > From: Young, Cody [mailto:Cody.Young@move.com]
> > Sent: Friday, March 30, 2012 4:35 PM
> > To: solr-user@lucene.apache.org
> > Subject: Distributed grouping issue
> >
> > Hi All,
> >
> > I'm having an issue getting distributed grouping  working on trunk 
> > (Mar
> 29, 2012).
> >
> > If I send this query:
> >
> > http://localhost:8086/solr/core0/select/?q=*:*&group=false&shards=lo
> > calhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/sol
> > r/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhos
> > t:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core
> > 7
> >
> > I get 260,000 results. As soon as I change to using grouping:
> >
> >
> http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=
> group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1
> ,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/so
> lr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost
> :8086/solr/core7
> >
> > I only get 32,000 results. (the number of documents in a single 
> > core.)
> >
> > The field that I am grouping on is defined as:
> >
> > <field name="group_field" type="string" indexed="true" stored="true"
> multiValued="false" />
> >
> > <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
> >
> > The document id:
> >
> >
> > <field name="document_id" type="string" indexed="true" stored="true"
> required="true" />
> >
> > <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
> >
> > <uniqueKey>document_id</uniqueKey>
> >
> > Anyone else experiencing this? Any ideas?
> >
> > Thanks,
> > Cody
>
>
>


--
Met vriendelijke groet,

Martijn van Groningen

Re: Distributed grouping issue

Posted by Martijn v Groningen <ma...@gmail.com>.
The "matches" element in the response should return the number of documents
that matched with the query and not the number of groups.
Did you encountered this issue also with other Solr versions (3.5 or
another nightly build)?

Martijn

On 2 April 2012 09:41, fbrisbart <fb...@bestofmedia.com> wrote:

> Hi,
>
> when you write "I get xxx results", does it come from 'numFound' ? Or
> you really display xxx results ?
> When using both field collapsing and sharding, the 'numFound' may be
> wrong. In that case, think about using 'shards.rows' parameter with a
> high value (be careful, it's bad for performance).
>
> If the problem is really about the returned results, it may be because
> of several documents having the same unique key "document_id" in
> different shards.
>
> Hope it helps,
> Franck
>
>
>
> Le vendredi 30 mars 2012 à 23:52 +0000, Young, Cody a écrit :
> > I forgot to mention, I can see the distributed requests happening in the
> logs:
> >
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core2] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core2&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core4] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core1] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core1&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core3] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core3&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core0] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core0&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core6] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=0
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core7] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core7&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=3
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core5] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core5&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core4] webapp=/solr path=/select
> params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core6] webapp=/solr path=/select
> params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core4] webapp=/solr path=/select
> params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10}
> status=0 QTime=5
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core0] webapp=/solr path=/select/
> params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true}
> status=0 QTime=106
> >
> > -----Original Message-----
> > From: Young, Cody [mailto:Cody.Young@move.com]
> > Sent: Friday, March 30, 2012 4:35 PM
> > To: solr-user@lucene.apache.org
> > Subject: Distributed grouping issue
> >
> > Hi All,
> >
> > I'm having an issue getting distributed grouping  working on trunk (Mar
> 29, 2012).
> >
> > If I send this query:
> >
> > http://localhost:8086/solr/core0/select/?q=*:*&group=false&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7
> >
> > I get 260,000 results. As soon as I change to using grouping:
> >
> >
> http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7
> >
> > I only get 32,000 results. (the number of documents in a single core.)
> >
> > The field that I am grouping on is defined as:
> >
> > <field name="group_field" type="string" indexed="true" stored="true"
> multiValued="false" />
> >
> > <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
> >
> > The document id:
> >
> >
> > <field name="document_id" type="string" indexed="true" stored="true"
> required="true" />
> >
> > <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
> >
> > <uniqueKey>document_id</uniqueKey>
> >
> > Anyone else experiencing this? Any ideas?
> >
> > Thanks,
> > Cody
>
>
>


-- 
Met vriendelijke groet,

Martijn van Groningen

RE: Distributed grouping issue

Posted by fbrisbart <fb...@bestofmedia.com>.
Hi,

when you write "I get xxx results", does it come from 'numFound' ? Or
you really display xxx results ?
When using both field collapsing and sharding, the 'numFound' may be
wrong. In that case, think about using 'shards.rows' parameter with a
high value (be careful, it's bad for performance).

If the problem is really about the returned results, it may be because
of several documents having the same unique key "document_id" in
different shards.

Hope it helps,
Franck



Le vendredi 30 mars 2012 à 23:52 +0000, Young, Cody a écrit :
> I forgot to mention, I can see the distributed requests happening in the logs:
> 
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core2] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core2&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=2
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core4] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core1] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core1&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core3] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core3&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core0] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core0&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core6] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=0
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core7] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core7&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=3
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core5] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core5&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core4] webapp=/solr path=/select params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=2
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core6] webapp=/solr path=/select params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=2
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core4] webapp=/solr path=/select params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10} status=0 QTime=5
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core0] webapp=/solr path=/select/ params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true} status=0 QTime=106 
> 
> -----Original Message-----
> From: Young, Cody [mailto:Cody.Young@move.com] 
> Sent: Friday, March 30, 2012 4:35 PM
> To: solr-user@lucene.apache.org
> Subject: Distributed grouping issue
> 
> Hi All,
> 
> I'm having an issue getting distributed grouping  working on trunk (Mar 29, 2012).
> 
> If I send this query:
> 
> http://localhost:8086/solr/core0/select/?q=*:*&group=false &shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7
> 
> I get 260,000 results. As soon as I change to using grouping:
> 
> http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7
> 
> I only get 32,000 results. (the number of documents in a single core.)
> 
> The field that I am grouping on is defined as:
> 
> <field name="group_field" type="string" indexed="true" stored="true" multiValued="false" />
> 
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
> 
> The document id:
> 
> 
> <field name="document_id" type="string" indexed="true" stored="true" required="true" />
> 
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
> 
> <uniqueKey>document_id</uniqueKey>
> 
> Anyone else experiencing this? Any ideas?
> 
> Thanks,
> Cody



RE: Distributed grouping issue

Posted by "Young, Cody" <Co...@move.com>.
I forgot to mention, I can see the distributed requests happening in the logs:

Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core2] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core2&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=2
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core4] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core1] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core1&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core3] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core3&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core0&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core6] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=0
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core7] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core7&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=3
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core5] webapp=/solr path=/select params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&version=2&fl=document_id,score&shard.url=localhost:8086/solr/core5&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=1
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core4] webapp=/solr path=/select params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=2
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core6] webapp=/solr path=/select params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} status=0 QTime=2
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core4] webapp=/solr path=/select params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10} status=0 QTime=5
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/select/ params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true} status=0 QTime=106 

-----Original Message-----
From: Young, Cody [mailto:Cody.Young@move.com] 
Sent: Friday, March 30, 2012 4:35 PM
To: solr-user@lucene.apache.org
Subject: Distributed grouping issue

Hi All,

I'm having an issue getting distributed grouping  working on trunk (Mar 29, 2012).

If I send this query:

http://localhost:8086/solr/core0/select/?q=*:*&group=false &shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I get 260,000 results. As soon as I change to using grouping:

http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I only get 32,000 results. (the number of documents in a single core.)

The field that I am grouping on is defined as:

<field name="group_field" type="string" indexed="true" stored="true" multiValued="false" />

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

The document id:


<field name="document_id" type="string" indexed="true" stored="true" required="true" />

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

<uniqueKey>document_id</uniqueKey>

Anyone else experiencing this? Any ideas?

Thanks,
Cody