You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Moen Endre <en...@imr.no> on 2015/09/28 12:47:04 UTC

PathHierarchyTokenizerFactory and facet_count

How does facet_count work with a facet field that is defined as solr. PathHierarchyTokenizerFactory?

I have multiple records that contains field Parameter which is of type PathHierarchyTokenizerFactory.
E.g
        "Parameter": [
          "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATURE",
          "EARTH SCIENCE>OCEANS>OCEAN PRESSURE>WATER PRESSURE",
          "EARTH SCIENCE>OCEANS>OCEAN ACOUSTICS>ACOUSTIC VELOCITY",
          "EARTH SCIENCE>ACOUSTIC",
          "EARTH SCIENCE>VELOCITY",
          "EARTH SCIENCE>ACOBAR | ACOUSTIC TECHNOLOGY FOR OBSERVING THE INTERIOR OF THE ARCTIC OCEAN",
          "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
          "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
        ],

But when I run a query to get all facet counts for Parameter - with this query:
http://localhost:8983/solr/nmdc/query? q=*:*&facet=true&rows=0&facet.mincount=1&facet.field=Parameter

the two last entries from this record;
"EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
"EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"

is missing from the facet_count - which looks like:

  "facet_counts":{

    "facet_queries":{},

    "facet_fields":{

      "Parameter":[

        "EARTH SCIENCE",228,

        "EARTH SCIENCE>OCEANS",128,

        "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE",100,

        "EARTH SCIENCE>OCEANS>SALINITY/DENSITY",90,
...

Im running solr 5.0

Why does the query seem to omit some of the Parameter entries from records?
Path is configured with:
        <fieldType name="path" class="solr.TextField">
                <analyzer type="index">
                        <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter=">" />
                </analyzer>
                <analyzer type="query">
                        <tokenizer class="solr.KeywordTokenizerFactory" />
                </analyzer>
        </fieldType>

Cheers
Endre


Re: PathHierarchyTokenizerFactory and facet_count

Posted by Alessandro Benedetti <be...@gmail.com>.
>From the Solr wiki, the default facet.limit should be 100 !
Anyway I find the way field facet is shown for field path hierarchy token
filtered fields, to be not so user friendly.
Ideally for those fields we should show a facet representation similar to
facet pivot.
Should be nice to think an idea to do that.


Cheers

2015-09-28 14:47 GMT+01:00 Moen Endre <en...@imr.no>:

> Yes, that solved my problem. There must be an implisite facet.limit set
> because I tried the same url query with face.limit=10000. And got back
> records with "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
>
> Cheers!
> Endre
>
> -----Original Message-----
> From: Upayavira [mailto:uv@odoko.co.uk]
> Sent: 28. september 2015 14:01
> To: solr-user@lucene.apache.org
> Subject: Re: PathHierarchyTokenizerFactory and facet_count
>
> There is also facet.limit which says how many facet entries to return.
> Is that catching you?
>
> The document either matches your query, or doesn't. If it does, then all
> values of the Parameter field should be included in your faceting. But,
> perhaps not all facet buckets are being returned to you - hence try
> facet.limit = 100 or such
>
> Upayavira
>
> On Mon, Sep 28, 2015, at 11:47 AM, Moen Endre wrote:
> > How does facet_count work with a facet field that is defined as solr.
> > PathHierarchyTokenizerFactory?
> >
> > I have multiple records that contains field Parameter which is of type
> > PathHierarchyTokenizerFactory.
> > E.g
> >         "Parameter": [
> >           "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATURE",
> >           "EARTH SCIENCE>OCEANS>OCEAN PRESSURE>WATER PRESSURE",
> >           "EARTH SCIENCE>OCEANS>OCEAN ACOUSTICS>ACOUSTIC VELOCITY",
> >           "EARTH SCIENCE>ACOUSTIC",
> >           "EARTH SCIENCE>VELOCITY",
> >           "EARTH SCIENCE>ACOBAR | ACOUSTIC TECHNOLOGY FOR OBSERVING THE
> >           INTERIOR OF THE ARCTIC OCEAN",
> >           "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
> >           "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> >         ],
> >
> > But when I run a query to get all facet counts for Parameter - with
> > this
> > query:
> > http://localhost:8983/solr/nmdc/query?
> > q=*:*&facet=true&rows=0&facet.mincount=1&facet.field=Parameter
> >
> > the two last entries from this record; "EARTH SCIENCE>GEOGRAPHIC
> > REGION>POLAR", "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> >
> > is missing from the facet_count - which looks like:
> >
> >   "facet_counts":{
> >
> >     "facet_queries":{},
> >
> >     "facet_fields":{
> >
> >       "Parameter":[
> >
> >         "EARTH SCIENCE",228,
> >
> >         "EARTH SCIENCE>OCEANS",128,
> >
> >         "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE",100,
> >
> >         "EARTH SCIENCE>OCEANS>SALINITY/DENSITY",90,
> > ...
> >
> > Im running solr 5.0
> >
> > Why does the query seem to omit some of the Parameter entries from
> > records?
> > Path is configured with:
> >         <fieldType name="path" class="solr.TextField">
> >                 <analyzer type="index">
> >                         <tokenizer
> >                         class="solr.PathHierarchyTokenizerFactory"
> >                         delimiter=">" />
> >                 </analyzer>
> >                 <analyzer type="query">
> >                         <tokenizer class="solr.KeywordTokenizerFactory"
> >                         />
> >                 </analyzer>
> >         </fieldType>
> >
> > Cheers
> > Endre
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

RE: PathHierarchyTokenizerFactory and facet_count

Posted by Moen Endre <en...@imr.no>.
Yes, that solved my problem. There must be an implisite facet.limit set because I tried the same url query with face.limit=10000. And got back records with "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"

Cheers!
Endre

-----Original Message-----
From: Upayavira [mailto:uv@odoko.co.uk] 
Sent: 28. september 2015 14:01
To: solr-user@lucene.apache.org
Subject: Re: PathHierarchyTokenizerFactory and facet_count

There is also facet.limit which says how many facet entries to return.
Is that catching you?

The document either matches your query, or doesn't. If it does, then all values of the Parameter field should be included in your faceting. But, perhaps not all facet buckets are being returned to you - hence try facet.limit = 100 or such

Upayavira

On Mon, Sep 28, 2015, at 11:47 AM, Moen Endre wrote:
> How does facet_count work with a facet field that is defined as solr.
> PathHierarchyTokenizerFactory?
> 
> I have multiple records that contains field Parameter which is of type 
> PathHierarchyTokenizerFactory.
> E.g
>         "Parameter": [
>           "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATURE",
>           "EARTH SCIENCE>OCEANS>OCEAN PRESSURE>WATER PRESSURE",
>           "EARTH SCIENCE>OCEANS>OCEAN ACOUSTICS>ACOUSTIC VELOCITY",
>           "EARTH SCIENCE>ACOUSTIC",
>           "EARTH SCIENCE>VELOCITY",
>           "EARTH SCIENCE>ACOBAR | ACOUSTIC TECHNOLOGY FOR OBSERVING THE
>           INTERIOR OF THE ARCTIC OCEAN",
>           "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
>           "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
>         ],
> 
> But when I run a query to get all facet counts for Parameter - with 
> this
> query:
> http://localhost:8983/solr/nmdc/query?
> q=*:*&facet=true&rows=0&facet.mincount=1&facet.field=Parameter
> 
> the two last entries from this record; "EARTH SCIENCE>GEOGRAPHIC 
> REGION>POLAR", "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> 
> is missing from the facet_count - which looks like:
> 
>   "facet_counts":{
> 
>     "facet_queries":{},
> 
>     "facet_fields":{
> 
>       "Parameter":[
> 
>         "EARTH SCIENCE",228,
> 
>         "EARTH SCIENCE>OCEANS",128,
> 
>         "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE",100,
> 
>         "EARTH SCIENCE>OCEANS>SALINITY/DENSITY",90,
> ...
> 
> Im running solr 5.0
> 
> Why does the query seem to omit some of the Parameter entries from 
> records?
> Path is configured with:
>         <fieldType name="path" class="solr.TextField">
>                 <analyzer type="index">
>                         <tokenizer
>                         class="solr.PathHierarchyTokenizerFactory"
>                         delimiter=">" />
>                 </analyzer>
>                 <analyzer type="query">
>                         <tokenizer class="solr.KeywordTokenizerFactory"
>                         />
>                 </analyzer>
>         </fieldType>
> 
> Cheers
> Endre
> 

Re: PathHierarchyTokenizerFactory and facet_count

Posted by Upayavira <uv...@odoko.co.uk>.
There is also facet.limit which says how many facet entries to return.
Is that catching you?

The document either matches your query, or doesn't. If it does, then all
values of the Parameter field should be included in your faceting. But,
perhaps not all facet buckets are being returned to you - hence try
facet.limit = 100 or such

Upayavira

On Mon, Sep 28, 2015, at 11:47 AM, Moen Endre wrote:
> How does facet_count work with a facet field that is defined as solr.
> PathHierarchyTokenizerFactory?
> 
> I have multiple records that contains field Parameter which is of type
> PathHierarchyTokenizerFactory.
> E.g
>         "Parameter": [
>           "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE>WATER TEMPERATURE",
>           "EARTH SCIENCE>OCEANS>OCEAN PRESSURE>WATER PRESSURE",
>           "EARTH SCIENCE>OCEANS>OCEAN ACOUSTICS>ACOUSTIC VELOCITY",
>           "EARTH SCIENCE>ACOUSTIC",
>           "EARTH SCIENCE>VELOCITY",
>           "EARTH SCIENCE>ACOBAR | ACOUSTIC TECHNOLOGY FOR OBSERVING THE
>           INTERIOR OF THE ARCTIC OCEAN",
>           "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
>           "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
>         ],
> 
> But when I run a query to get all facet counts for Parameter - with this
> query:
> http://localhost:8983/solr/nmdc/query?
> q=*:*&facet=true&rows=0&facet.mincount=1&facet.field=Parameter
> 
> the two last entries from this record;
> "EARTH SCIENCE>GEOGRAPHIC REGION>POLAR",
> "EARTH SCIENCE>GEOGRAPHIC REGION>ARCTIC"
> 
> is missing from the facet_count - which looks like:
> 
>   "facet_counts":{
> 
>     "facet_queries":{},
> 
>     "facet_fields":{
> 
>       "Parameter":[
> 
>         "EARTH SCIENCE",228,
> 
>         "EARTH SCIENCE>OCEANS",128,
> 
>         "EARTH SCIENCE>OCEANS>OCEAN TEMPERATURE",100,
> 
>         "EARTH SCIENCE>OCEANS>SALINITY/DENSITY",90,
> ...
> 
> Im running solr 5.0
> 
> Why does the query seem to omit some of the Parameter entries from
> records?
> Path is configured with:
>         <fieldType name="path" class="solr.TextField">
>                 <analyzer type="index">
>                         <tokenizer
>                         class="solr.PathHierarchyTokenizerFactory"
>                         delimiter=">" />
>                 </analyzer>
>                 <analyzer type="query">
>                         <tokenizer class="solr.KeywordTokenizerFactory"
>                         />
>                 </analyzer>
>         </fieldType>
> 
> Cheers
> Endre
>