You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vadim Ivanov <va...@intourist.ru.INVALID> on 2020/02/14 07:24:33 UTC

Solr grouping with offset

Hello guys!
I need an advise. My task is to delete some documents in collection.
Del algorithm is following:
Group docs by field1  with sort by field2 and delete every 3 and following occurrences in every group.
Unfortunately I didn't find easy way to do so.
Closest approach was to use group.offset = 2, but  result set is polluted with empty groups with no documents (they have less then 3 docs in group).
May be I'm missing smth and there is way not to receive empty groups in results?
Next approach was to use facet first with facet.mincount=3, then find docs ids by every facet result  and then delete docs by id.
That way seems to me  too complicated for the task.
What's the best use case for the task?

Re: Solr grouping with offset

Posted by Saurabh Sharma <sa...@gmail.com>.
Hi,

Yes. I meant facet.mincount only.


Thanks
Saurabh

On Fri, Feb 14, 2020, 8:51 PM Vadim Ivanov <
vadim.ivanov@spb.ntk-intourist.ru> wrote:

> group.mincount ? Never heard of it. It exists?
> May be you have in mind facet.mincount and second approach mentioned
> earlier:
>
> > > > > Next approach was to use facet first with facet.mincount=3, then
> > > > > find docs ids by every facet result  and then delete docs by id.
> > > > > That way seems to me  too complicated for the task.
>
> > -----Original Message-----
> > From: Saurabh Sharma [mailto:saurabh.infoedge@gmail.com]
> > Sent: Friday, February 14, 2020 4:36 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr grouping with offset
> >
> > Hi,
> >
> > If you want to sort on your field and want to put a count restriction
> too then
> > you have to use mincount. That seems to be best approach for your
> > problem.
> >
> > Thanks
> > Saurabh
> >
> > On Fri, Feb 14, 2020, 6:24 PM Vadim Ivanov < vadim.ivanov@spb.ntk-
> > intourist.ru> wrote:
> >
> > > Example of gtouping with empty groups in results:
> > > Filed1 = rr_group, field2 = rr_updatedate Problem is that I have tens
> > > of million groups in result and only several thousand with  "numFound"
> > > >2
> > >
> > > "params":{
> > >       "q":"*:* ",
> > >       "group.sort":"rr_updatedate desc ",
> > >       "group.limit":"-1",
> > >       "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
> > >       "group.offset":"2",
> > >       "wt":"json",
> > >       "group.field":"rr_group",
> > >       "group":"true"}},
> > >   "grouped":{
> > >     "rr_group":{
> > >       "matches":41475082,
> > >       "groups":[{
> > >           "groupValue":"164370:20200707:23:251",
> > >           "doclist":{"numFound":1,"start":2,"docs":[]
> > >           }},
> > >         {
> > >           "groupValue":"163942:20200708:22:251",
> > >           "doclist":{"numFound":1,"start":2,"docs":[]
> > >           }},
> > >         {
> > >           "groupValue":"163943:20200708:22:251",
> > >           "doclist":{"numFound":1,"start":2,"docs":[]
> > >           }},
> > >         {
> > >           "groupValue":"164355:20200708:22:251",
> > >           "doclist":{"numFound":1,"start":2,"docs":[]
> > >
> > > > -----Original Message-----
> > > > From: Paras Lehana [mailto:paras.lehana@indiamart.com]
> > > > Sent: Friday, February 14, 2020 3:37 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Solr grouping with offset
> > > >
> > > > It would be better if you give us an example.
> > > >
> > > > On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> > > > <va...@intourist.ru.invalid> wrote:
> > > >
> > > > > Hello guys!
> > > > > I need an advise. My task is to delete some documents in
> collection.
> > > > > Del algorithm is following:
> > > > > Group docs by field1  with sort by field2 and delete every 3 and
> > > > > following occurrences in every group.
> > > > > Unfortunately I didn't find easy way to do so.
> > > > > Closest approach was to use group.offset = 2, but  result set is
> > > > > polluted with empty groups with no documents (they have less then
> > > > > 3
> > > docs
> > > > in group).
> > > > > May be I'm missing smth and there is way not to receive empty
> > > > > groups in results?
> > > > > Next approach was to use facet first with facet.mincount=3, then
> > > > > find docs ids by every facet result  and then delete docs by id.
> > > > > That way seems to me  too complicated for the task.
> > > > > What's the best use case for the task?
> > > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > Regards,
> > > >
> > > > *Paras Lehana* [65871]
> > > > Development Engineer, *Auto-Suggest*, IndiaMART InterMESH Ltd,
> > > >
> > > > 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22,
> > > > Sector
> > > 135,
> > > > Noida, Uttar Pradesh, India 201305
> > > >
> > > > Mob.: +91-9560911996
> > > > Work: 0120-4056700 | Extn:
> > > > *11096*
> > > >
> > > > --
> > > > *
> > > > *
> > > >
> > > >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> > >
> > >
>
>

RE: Solr grouping with offset

Posted by Vadim Ivanov <va...@spb.ntk-intourist.ru>.
group.mincount ? Never heard of it. It exists?
May be you have in mind facet.mincount and second approach mentioned earlier:

> > > > Next approach was to use facet first with facet.mincount=3, then
> > > > find docs ids by every facet result  and then delete docs by id.
> > > > That way seems to me  too complicated for the task.

> -----Original Message-----
> From: Saurabh Sharma [mailto:saurabh.infoedge@gmail.com]
> Sent: Friday, February 14, 2020 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr grouping with offset
> 
> Hi,
> 
> If you want to sort on your field and want to put a count restriction too then
> you have to use mincount. That seems to be best approach for your
> problem.
> 
> Thanks
> Saurabh
> 
> On Fri, Feb 14, 2020, 6:24 PM Vadim Ivanov < vadim.ivanov@spb.ntk-
> intourist.ru> wrote:
> 
> > Example of gtouping with empty groups in results:
> > Filed1 = rr_group, field2 = rr_updatedate Problem is that I have tens
> > of million groups in result and only several thousand with  "numFound"
> > >2
> >
> > "params":{
> >       "q":"*:* ",
> >       "group.sort":"rr_updatedate desc ",
> >       "group.limit":"-1",
> >       "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
> >       "group.offset":"2",
> >       "wt":"json",
> >       "group.field":"rr_group",
> >       "group":"true"}},
> >   "grouped":{
> >     "rr_group":{
> >       "matches":41475082,
> >       "groups":[{
> >           "groupValue":"164370:20200707:23:251",
> >           "doclist":{"numFound":1,"start":2,"docs":[]
> >           }},
> >         {
> >           "groupValue":"163942:20200708:22:251",
> >           "doclist":{"numFound":1,"start":2,"docs":[]
> >           }},
> >         {
> >           "groupValue":"163943:20200708:22:251",
> >           "doclist":{"numFound":1,"start":2,"docs":[]
> >           }},
> >         {
> >           "groupValue":"164355:20200708:22:251",
> >           "doclist":{"numFound":1,"start":2,"docs":[]
> >
> > > -----Original Message-----
> > > From: Paras Lehana [mailto:paras.lehana@indiamart.com]
> > > Sent: Friday, February 14, 2020 3:37 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Solr grouping with offset
> > >
> > > It would be better if you give us an example.
> > >
> > > On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> > > <va...@intourist.ru.invalid> wrote:
> > >
> > > > Hello guys!
> > > > I need an advise. My task is to delete some documents in collection.
> > > > Del algorithm is following:
> > > > Group docs by field1  with sort by field2 and delete every 3 and
> > > > following occurrences in every group.
> > > > Unfortunately I didn't find easy way to do so.
> > > > Closest approach was to use group.offset = 2, but  result set is
> > > > polluted with empty groups with no documents (they have less then
> > > > 3
> > docs
> > > in group).
> > > > May be I'm missing smth and there is way not to receive empty
> > > > groups in results?
> > > > Next approach was to use facet first with facet.mincount=3, then
> > > > find docs ids by every facet result  and then delete docs by id.
> > > > That way seems to me  too complicated for the task.
> > > > What's the best use case for the task?
> > > >
> > >
> > >
> > > --
> > > --
> > > Regards,
> > >
> > > *Paras Lehana* [65871]
> > > Development Engineer, *Auto-Suggest*, IndiaMART InterMESH Ltd,
> > >
> > > 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22,
> > > Sector
> > 135,
> > > Noida, Uttar Pradesh, India 201305
> > >
> > > Mob.: +91-9560911996
> > > Work: 0120-4056700 | Extn:
> > > *11096*
> > >
> > > --
> > > *
> > > *
> > >
> > >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> >
> >


Re: Solr grouping with offset

Posted by Saurabh Sharma <sa...@gmail.com>.
Hi,

If you want to sort on your field and want to put a count restriction too
then you have to use mincount. That seems to be best approach for your
problem.

Thanks
Saurabh

On Fri, Feb 14, 2020, 6:24 PM Vadim Ivanov <
vadim.ivanov@spb.ntk-intourist.ru> wrote:

> Example of gtouping with empty groups in results:
> Filed1 = rr_group, field2 = rr_updatedate
> Problem is that I have tens of million groups in result and only several
> thousand with  "numFound" >2
>
> "params":{
>       "q":"*:* ",
>       "group.sort":"rr_updatedate desc ",
>       "group.limit":"-1",
>       "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
>       "group.offset":"2",
>       "wt":"json",
>       "group.field":"rr_group",
>       "group":"true"}},
>   "grouped":{
>     "rr_group":{
>       "matches":41475082,
>       "groups":[{
>           "groupValue":"164370:20200707:23:251",
>           "doclist":{"numFound":1,"start":2,"docs":[]
>           }},
>         {
>           "groupValue":"163942:20200708:22:251",
>           "doclist":{"numFound":1,"start":2,"docs":[]
>           }},
>         {
>           "groupValue":"163943:20200708:22:251",
>           "doclist":{"numFound":1,"start":2,"docs":[]
>           }},
>         {
>           "groupValue":"164355:20200708:22:251",
>           "doclist":{"numFound":1,"start":2,"docs":[]
>
> > -----Original Message-----
> > From: Paras Lehana [mailto:paras.lehana@indiamart.com]
> > Sent: Friday, February 14, 2020 3:37 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr grouping with offset
> >
> > It would be better if you give us an example.
> >
> > On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> > <va...@intourist.ru.invalid> wrote:
> >
> > > Hello guys!
> > > I need an advise. My task is to delete some documents in collection.
> > > Del algorithm is following:
> > > Group docs by field1  with sort by field2 and delete every 3 and
> > > following occurrences in every group.
> > > Unfortunately I didn't find easy way to do so.
> > > Closest approach was to use group.offset = 2, but  result set is
> > > polluted with empty groups with no documents (they have less then 3
> docs
> > in group).
> > > May be I'm missing smth and there is way not to receive empty groups
> > > in results?
> > > Next approach was to use facet first with facet.mincount=3, then find
> > > docs ids by every facet result  and then delete docs by id.
> > > That way seems to me  too complicated for the task.
> > > What's the best use case for the task?
> > >
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, *Auto-Suggest*,
> > IndiaMART InterMESH Ltd,
> >
> > 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22, Sector
> 135,
> > Noida, Uttar Pradesh, India 201305
> >
> > Mob.: +91-9560911996
> > Work: 0120-4056700 | Extn:
> > *11096*
> >
> > --
> > *
> > *
> >
> >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>

RE: Solr grouping with offset

Posted by Vadim Ivanov <va...@spb.ntk-intourist.ru>.
Example of gtouping with empty groups in results:
Filed1 = rr_group, field2 = rr_updatedate
Problem is that I have tens of million groups in result and only several thousand with  "numFound" >2
   
"params":{
      "q":"*:* ",
      "group.sort":"rr_updatedate desc ",
      "group.limit":"-1",
      "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
      "group.offset":"2",
      "wt":"json",
      "group.field":"rr_group",
      "group":"true"}},
  "grouped":{
    "rr_group":{
      "matches":41475082,
      "groups":[{
          "groupValue":"164370:20200707:23:251",
          "doclist":{"numFound":1,"start":2,"docs":[]
          }},
        {
          "groupValue":"163942:20200708:22:251",
          "doclist":{"numFound":1,"start":2,"docs":[]
          }},
        {
          "groupValue":"163943:20200708:22:251",
          "doclist":{"numFound":1,"start":2,"docs":[]
          }},
        {
          "groupValue":"164355:20200708:22:251",
          "doclist":{"numFound":1,"start":2,"docs":[]

> -----Original Message-----
> From: Paras Lehana [mailto:paras.lehana@indiamart.com]
> Sent: Friday, February 14, 2020 3:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr grouping with offset
> 
> It would be better if you give us an example.
> 
> On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> <va...@intourist.ru.invalid> wrote:
> 
> > Hello guys!
> > I need an advise. My task is to delete some documents in collection.
> > Del algorithm is following:
> > Group docs by field1  with sort by field2 and delete every 3 and
> > following occurrences in every group.
> > Unfortunately I didn't find easy way to do so.
> > Closest approach was to use group.offset = 2, but  result set is
> > polluted with empty groups with no documents (they have less then 3 docs
> in group).
> > May be I'm missing smth and there is way not to receive empty groups
> > in results?
> > Next approach was to use facet first with facet.mincount=3, then find
> > docs ids by every facet result  and then delete docs by id.
> > That way seems to me  too complicated for the task.
> > What's the best use case for the task?
> >
> 
> 
> --
> --
> Regards,
> 
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
> 
> 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22, Sector 135,
> Noida, Uttar Pradesh, India 201305
> 
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *11096*
> 
> --
> *
> *
> 
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr grouping with offset

Posted by Paras Lehana <pa...@indiamart.com>.
It would be better if you give us an example.

On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
<va...@intourist.ru.invalid> wrote:

> Hello guys!
> I need an advise. My task is to delete some documents in collection.
> Del algorithm is following:
> Group docs by field1  with sort by field2 and delete every 3 and following
> occurrences in every group.
> Unfortunately I didn't find easy way to do so.
> Closest approach was to use group.offset = 2, but  result set is polluted
> with empty groups with no documents (they have less then 3 docs in group).
> May be I'm missing smth and there is way not to receive empty groups in
> results?
> Next approach was to use facet first with facet.mincount=3, then find docs
> ids by every facet result  and then delete docs by id.
> That way seems to me  too complicated for the task.
> What's the best use case for the task?
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>