You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by harish singh <ha...@gmail.com> on 2015/01/25 00:58:33 UTC

Facet Double Counting

Hi,

I am noticing a strange behavior with solr facet searching:

This is my facet query:


   - params:
   {
      - facet: "true",
      - sort: "startTimeISO desc",
      - debugQuery: "true",
      - facet.mincount: "1",
      - facet.sort: "count",
      - start: "0",
      - q: "requestType:(*login* or *LOGIN*) AND (user:(blabla*))",
      - facet.limit: "100",
      - facet.field: "loginUserName",
      - wt: "json",
      - fq: "startTimeISO:[2015-01-22T00:00:00.000Z TO
      2015-01-23T00:00:00.000Z]",
      - rows: "0"
      }


The result I am getting is:

facet_counts:
{

   - facet_queries: { },
   - facet_fields:
   {
      - loginUserName:
      [
         - "harry",
         - 36,
         - "larry",
         - 10,
         - "Carey"
         ]
      },
   - facet_dates: { },
   - facet_ranges: { }

}



As you see, the result is showing Facet-Count for "loginUserName= harry" is
36.

So when I do a Solr Search for logs, I should get 36 logs.
But I am getting 18.
This happening for all the searches now.


For some reason, I see double counting.

Either Facetting is Double counting or Search is half-counting ?


This is my Solr Search Query:



   - params:
   {
      - sort: "startTimeISO desc",
      - debugQuery: "true",
      - start: "0",
      - q: "requestType:(*login* or *LOGIN*) AND (user:(blabla*)) AND (
      loginUserName:("harry"))",
      - wt: "json",
      - fq: "startTimeISO:["2015-01-22T00:00:00.000Z" TO
      "2015-01-23T00:00:00.000Z"]",
      - rows: "200"
      }



This query gives only 18 logs. But Solr Facet Query gave 36.


Is there something incorrect in any of my (or both) queries?
I am trying to debug but it I think I am missing something silly.

Re: Facet Double Counting

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
weird, optimize or expungeDeletes=true should do the trick.
Can you try to optimise this time?

On Sunday, January 25, 2015 11:08 AM, harish singh <ha...@gmail.com> wrote:
Still the same.

Can the reason be that if there are duplicate logs/documents, then
the Facet query will count them, but when I do the Search Query, solr
eliminates the duplicates?



On Sat, Jan 24, 2015 at 11:47 PM, Ahmet Arslan <io...@yahoo.com.invalid>
wrote:

>
>
> Hi Harish,
>
> What happens when you purge deleted terms with
> 'solr/core/update?commit=true&expungeDeletes=true'
>
> ahmet
>
>
>
> On Sunday, January 25, 2015 1:59 AM, harish singh <
> harish.singh22@gmail.com> wrote:
> Hi,
>
> I am noticing a strange behavior with solr facet searching:
>
> This is my facet query:
>
>
>    - params:
>    {
>       - facet: "true",
>       - sort: "startTimeISO desc",
>       - debugQuery: "true",
>       - facet.mincount: "1",
>       - facet.sort: "count",
>       - start: "0",
>       - q: "requestType:(*login* or *LOGIN*) AND (user:(blabla*))",
>       - facet.limit: "100",
>       - facet.field: "loginUserName",
>       - wt: "json",
>       - fq: "startTimeISO:[2015-01-22T00:00:00.000Z TO
>       2015-01-23T00:00:00.000Z]",
>       - rows: "0"
>       }
>
>
> The result I am getting is:
>
> facet_counts:
> {
>
>    - facet_queries: { },
>    - facet_fields:
>    {
>       - loginUserName:
>       [
>          - "harry",
>          - 36,
>          - "larry",
>          - 10,
>          - "Carey"
>          ]
>       },
>    - facet_dates: { },
>    - facet_ranges: { }
>
> }
>
>
>
> As you see, the result is showing Facet-Count for "loginUserName= harry" is
> 36.
>
> So when I do a Solr Search for logs, I should get 36 logs.
> But I am getting 18.
> This happening for all the searches now.
>
>
> For some reason, I see double counting.
>
> Either Facetting is Double counting or Search is half-counting ?
>
>
> This is my Solr Search Query:
>
>
>
>    - params:
>    {
>       - sort: "startTimeISO desc",
>       - debugQuery: "true",
>       - start: "0",
>       - q: "requestType:(*login* or *LOGIN*) AND (user:(blabla*)) AND (
>       loginUserName:("harry"))",
>       - wt: "json",
>       - fq: "startTimeISO:["2015-01-22T00:00:00.000Z" TO
>       "2015-01-23T00:00:00.000Z"]",
>       - rows: "200"
>       }
>
>
>
> This query gives only 18 logs. But Solr Facet Query gave 36.
>
>
> Is there something incorrect in any of my (or both) queries?
> I am trying to debug but it I think I am missing something silly.
>

Re: Facet Double Counting

Posted by harish singh <ha...@gmail.com>.
Still the same.

Can the reason be that if there are duplicate logs/documents, then
the Facet query will count them, but when I do the Search Query, solr
eliminates the duplicates?


On Sat, Jan 24, 2015 at 11:47 PM, Ahmet Arslan <io...@yahoo.com.invalid>
wrote:

>
>
> Hi Harish,
>
> What happens when you purge deleted terms with
> 'solr/core/update?commit=true&expungeDeletes=true'
>
> ahmet
>
>
>
> On Sunday, January 25, 2015 1:59 AM, harish singh <
> harish.singh22@gmail.com> wrote:
> Hi,
>
> I am noticing a strange behavior with solr facet searching:
>
> This is my facet query:
>
>
>    - params:
>    {
>       - facet: "true",
>       - sort: "startTimeISO desc",
>       - debugQuery: "true",
>       - facet.mincount: "1",
>       - facet.sort: "count",
>       - start: "0",
>       - q: "requestType:(*login* or *LOGIN*) AND (user:(blabla*))",
>       - facet.limit: "100",
>       - facet.field: "loginUserName",
>       - wt: "json",
>       - fq: "startTimeISO:[2015-01-22T00:00:00.000Z TO
>       2015-01-23T00:00:00.000Z]",
>       - rows: "0"
>       }
>
>
> The result I am getting is:
>
> facet_counts:
> {
>
>    - facet_queries: { },
>    - facet_fields:
>    {
>       - loginUserName:
>       [
>          - "harry",
>          - 36,
>          - "larry",
>          - 10,
>          - "Carey"
>          ]
>       },
>    - facet_dates: { },
>    - facet_ranges: { }
>
> }
>
>
>
> As you see, the result is showing Facet-Count for "loginUserName= harry" is
> 36.
>
> So when I do a Solr Search for logs, I should get 36 logs.
> But I am getting 18.
> This happening for all the searches now.
>
>
> For some reason, I see double counting.
>
> Either Facetting is Double counting or Search is half-counting ?
>
>
> This is my Solr Search Query:
>
>
>
>    - params:
>    {
>       - sort: "startTimeISO desc",
>       - debugQuery: "true",
>       - start: "0",
>       - q: "requestType:(*login* or *LOGIN*) AND (user:(blabla*)) AND (
>       loginUserName:("harry"))",
>       - wt: "json",
>       - fq: "startTimeISO:["2015-01-22T00:00:00.000Z" TO
>       "2015-01-23T00:00:00.000Z"]",
>       - rows: "200"
>       }
>
>
>
> This query gives only 18 logs. But Solr Facet Query gave 36.
>
>
> Is there something incorrect in any of my (or both) queries?
> I am trying to debug but it I think I am missing something silly.
>

Re: Facet Double Counting

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.

Hi Harish,

What happens when you purge deleted terms with 'solr/core/update?commit=true&expungeDeletes=true'

ahmet



On Sunday, January 25, 2015 1:59 AM, harish singh <ha...@gmail.com> wrote:
Hi,

I am noticing a strange behavior with solr facet searching:

This is my facet query:


   - params:
   {
      - facet: "true",
      - sort: "startTimeISO desc",
      - debugQuery: "true",
      - facet.mincount: "1",
      - facet.sort: "count",
      - start: "0",
      - q: "requestType:(*login* or *LOGIN*) AND (user:(blabla*))",
      - facet.limit: "100",
      - facet.field: "loginUserName",
      - wt: "json",
      - fq: "startTimeISO:[2015-01-22T00:00:00.000Z TO
      2015-01-23T00:00:00.000Z]",
      - rows: "0"
      }


The result I am getting is:

facet_counts:
{

   - facet_queries: { },
   - facet_fields:
   {
      - loginUserName:
      [
         - "harry",
         - 36,
         - "larry",
         - 10,
         - "Carey"
         ]
      },
   - facet_dates: { },
   - facet_ranges: { }

}



As you see, the result is showing Facet-Count for "loginUserName= harry" is
36.

So when I do a Solr Search for logs, I should get 36 logs.
But I am getting 18.
This happening for all the searches now.


For some reason, I see double counting.

Either Facetting is Double counting or Search is half-counting ?


This is my Solr Search Query:



   - params:
   {
      - sort: "startTimeISO desc",
      - debugQuery: "true",
      - start: "0",
      - q: "requestType:(*login* or *LOGIN*) AND (user:(blabla*)) AND (
      loginUserName:("harry"))",
      - wt: "json",
      - fq: "startTimeISO:["2015-01-22T00:00:00.000Z" TO
      "2015-01-23T00:00:00.000Z"]",
      - rows: "200"
      }



This query gives only 18 logs. But Solr Facet Query gave 36.


Is there something incorrect in any of my (or both) queries?
I am trying to debug but it I think I am missing something silly.

RE: Facet Double Counting

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
harish singh [harish.singh22@gmail.com] wrote:
> I tried the Faceting on the UUID field.

Nice debug trick. I'll remember that to next time.

> So does this mean, when I do a facet query on facet.field= loginUserName,
> Solr does not look at the UUID?

Yes. For faceting, Solr only uses the internal docIDs and the facet field data.

> And the unique field (UUID in this case) is considered only while Search
> Queries?

For a distributed setup, the documents are resolved from the shards using uniqueKey.

I did not think this was the case for a non-distributed setup - for such setup, I thought that the documents were resolved using internal docIDs. If your index is single-shard, then I was wrong.

- Toke Eskildsen

Re: Facet Double Counting

Posted by harish singh <ha...@gmail.com>.
Oh yes!! :)
I tried the Faceting on the UUID field.
All the uuids have count = 2 ==> which probably explains why I am getting
Double counting in Facet result.

So does this mean, when I do a facet query on facet.field= loginUserName,
Solr does not look at the UUID?
And the unique field (UUID in this case) is considered only while Search
Queries?

On Sun, Jan 25, 2015 at 3:15 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> harish singh [harish.singh22@gmail.com] wrote:
> > As you see, the result is showing Facet-Count for "loginUserName= harry"
> is
> > 36.
> >
> > So when I do a Solr Search for logs, I should get 36 logs.
> > But I am getting 18.
> > This happening for all the searches now.
>
> If you have recently added or changed uniqueKey and if your index has
> multiple documents with the same key, that would explain the behaviour you
> describe. If that is so, I recommend you delete the index and rebuild it
> from scratch.
>
> - Toke Eskildsen
>

RE: Facet Double Counting

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
harish singh [harish.singh22@gmail.com] wrote:
> As you see, the result is showing Facet-Count for "loginUserName= harry" is
> 36.
> 
> So when I do a Solr Search for logs, I should get 36 logs.
> But I am getting 18.
> This happening for all the searches now.

If you have recently added or changed uniqueKey and if your index has multiple documents with the same key, that would explain the behaviour you describe. If that is so, I recommend you delete the index and rebuild it from scratch.

- Toke Eskildsen