You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Davis <jo...@gmail.com> on 2018/05/25 02:29:23 UTC

Sort by payload value

Hello,

We are trying to use payload values as described in [1] and are running
into issues when issuing *sort by* payload value.  Would appreciate any
pointers to what we might be doing wrong. We are running solr 6.6.0.

* Here's the payload value definition:

   <fieldType name="delimited_payloads_float" stored="true" indexed="true"
class="solr.TextField">
      <analyzer>
          <tokenizer class="solr.PatternTokenizerFactory"
pattern="[A-Za-z0-9][^|]*[|][0-9.]+" group="0"/>
          <filter class="solr.DelimitedPayloadTokenFilterFactory"
encoder="float"/>
          <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
  </fieldType>

* Query with sort by does not return documents sorted by the payload value:

{
  "responseHeader":{
    "status":0,
    "QTime":82,
    "params":{
      "q":"*:*",
      "indent":"on",
      "fl":"industry_value,${indexp}",
*      "indexp":"payload(industry_value, 'internet services', 0)",*
      "fq":["{!frange l=0.1}${indexp}",
        "industry_value:*"],
*      "sort":"${indexp} asc",*
      "rows":"10",
      "wt":"json"}},
  "response":{"numFound":102668,"start":0,"docs":[
      {
        "industry_value":"Startup|13.3890410959 Collaboration|12.3863013699
Document Management|12.3863013699 Chat|12.3863013699 Video
Conferencing|12.3863013699 Finance|1.0 Payments|1.0 Internet|1.0 Internet
Services|1.0 Top Companies|1.0",

        "payload(industry_value, 'internet services', 0)":*1.0*},

      {
        "industry_value":"Hardware|16.7616438356 Messaging and
Telecommunications|6.71780821918 Mobility|6.71780821918
Startup|6.71780821918 Analytics|6.71780821918 Development
Platforms|6.71780821918 Mobile Commerce|6.71780821918 Mobile
Security|6.71780821918 Privacy and Security|6.71780821918 Information
Security|6.71780821918 Cyber Security|6.71780821918 Finance|6.71780821918
Collaboration|6.71780821918 Enterprise|6.71780821918
Messaging|6.71780821918 Internet Services|6.71780821918 Information
Technology|6.71780821918 Contact Management|6.71780821918
Mobile|6.71780821918 Mobile Enterprise|6.71780821918 Data
Security|6.71780821918 Data and Analytics|6.71780821918
Security|6.71780821918",

        "payload(industry_value, 'internet services', 0)":*6.7178082*},

      {
        "industry_value":"Startup|4.46301369863 Advertising|1.24657534247
Content and Publishing|0.917808219178 Internet|0.917808219178 Social Media
Platforms|0.917808219178 Content Discovery|0.917808219178 Media and
Entertainment|0.917808219178 Social Media|0.917808219178 Sales and
Marketing|0.917808219178 Internet Services|0.917808219178 Advertising
Platforms|0.917808219178 Social Media Management|0.917808219178
Mobile|0.328767123288 Food and Beverage|0.252054794521 Real
Estate|0.252054794521 Consumer Goods|0.252054794521 FMCG|0.252054794521
Home Services|0.252054794521 Consumer|0.252054794521
Enterprise|0.167123287671",

        "payload(industry_value, 'internet services', 0)":*0.91780823*},

{
        "industry_value":"Startup|8.55068493151 Media and
Entertainment|5.54794520548 Transportation|5.54794520548
Ticketing|5.54794520548 Travel|5.54794520548 Travel and
Tourism|5.54794520548 Events|5.54794520548 Cloud Computing|2.33698630137
Collaboration|2.33698630137 Platforms|2.33698630137
Enterprise|2.33698630137 Internet Services|2.33698630137 Top
Companies|2.33698630137 Developer Tools|2.33698630137 Operating
Systems|2.33698630137 Search|1.83287671233 Internet|1.83287671233
Technology|1.83287671233 Portals|1.83287671233 Email|1.83287671233
Photography|1.83287671233",

        "payload(industry_value, 'internet services', 0)":*2.3369863*},


[1] https://lucidworks.com/2017/09/14/solr-payloads/

Re: Sort by payload value

Posted by John Davis <jo...@gmail.com>.
Hi Erik - Solr is tokenizing correctly as you can see it return the payload
field value along with the full payload and they match on the particular
field. The field does have a lowercase filter as you can see in the
definition. Changing it to single word query doesn't fix it either..

On Fri, May 25, 2018 at 8:22 AM, Erick Erickson <er...@gmail.com>
wrote:

> My first guess (and it's a total guess) is that you either have a case
> problem or
> you're tokenizing the string. Does your field definition lower-case the
> tokens?
> If it's a string type then certainly not.
>
> Quick test would be to try your query with a value that matches case
> and has no spaces,
> maybe "Portals". If that gives you the correct sort then you have a
> place to start....
>
> Adding &debug=query will help a bit, although it won't show you the
> guts of the payload
> calcs.
>
> FYI, ties are broken by the internal Lucene doc ID. If the theory that
> you are getting
> no matches, then your sort order is determined by this value which you
> don't really
> have much access to.....
>
> Best,
> Erick
>
> On Thu, May 24, 2018 at 7:29 PM, John Davis <jo...@gmail.com>
> wrote:
> > Hello,
> >
> > We are trying to use payload values as described in [1] and are running
> > into issues when issuing *sort by* payload value.  Would appreciate any
> > pointers to what we might be doing wrong. We are running solr 6.6.0.
> >
> > * Here's the payload value definition:
> >
> >    <fieldType name="delimited_payloads_float" stored="true"
> indexed="true"
> > class="solr.TextField">
> >       <analyzer>
> >           <tokenizer class="solr.PatternTokenizerFactory"
> > pattern="[A-Za-z0-9][^|]*[|][0-9.]+" group="0"/>
> >           <filter class="solr.DelimitedPayloadTokenFilterFactory"
> > encoder="float"/>
> >           <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >   </fieldType>
> >
> > * Query with sort by does not return documents sorted by the payload
> value:
> >
> > {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":82,
> >     "params":{
> >       "q":"*:*",
> >       "indent":"on",
> >       "fl":"industry_value,${indexp}",
> > *      "indexp":"payload(industry_value, 'internet services', 0)",*
> >       "fq":["{!frange l=0.1}${indexp}",
> >         "industry_value:*"],
> > *      "sort":"${indexp} asc",*
> >       "rows":"10",
> >       "wt":"json"}},
> >   "response":{"numFound":102668,"start":0,"docs":[
> >       {
> >         "industry_value":"Startup|13.3890410959
> Collaboration|12.3863013699
> > Document Management|12.3863013699 Chat|12.3863013699 Video
> > Conferencing|12.3863013699 Finance|1.0 Payments|1.0 Internet|1.0 Internet
> > Services|1.0 Top Companies|1.0",
> >
> >         "payload(industry_value, 'internet services', 0)":*1.0*},
> >
> >       {
> >         "industry_value":"Hardware|16.7616438356 Messaging and
> > Telecommunications|6.71780821918 Mobility|6.71780821918
> > Startup|6.71780821918 Analytics|6.71780821918 Development
> > Platforms|6.71780821918 Mobile Commerce|6.71780821918 Mobile
> > Security|6.71780821918 Privacy and Security|6.71780821918 Information
> > Security|6.71780821918 Cyber Security|6.71780821918 Finance|6.71780821918
> > Collaboration|6.71780821918 Enterprise|6.71780821918
> > Messaging|6.71780821918 Internet Services|6.71780821918 Information
> > Technology|6.71780821918 Contact Management|6.71780821918
> > Mobile|6.71780821918 Mobile Enterprise|6.71780821918 Data
> > Security|6.71780821918 Data and Analytics|6.71780821918
> > Security|6.71780821918",
> >
> >         "payload(industry_value, 'internet services', 0)":*6.7178082*},
> >
> >       {
> >         "industry_value":"Startup|4.46301369863
> Advertising|1.24657534247
> > Content and Publishing|0.917808219178 Internet|0.917808219178 Social
> Media
> > Platforms|0.917808219178 Content Discovery|0.917808219178 Media and
> > Entertainment|0.917808219178 Social Media|0.917808219178 Sales and
> > Marketing|0.917808219178 Internet Services|0.917808219178 Advertising
> > Platforms|0.917808219178 Social Media Management|0.917808219178
> > Mobile|0.328767123288 Food and Beverage|0.252054794521 Real
> > Estate|0.252054794521 Consumer Goods|0.252054794521 FMCG|0.252054794521
> > Home Services|0.252054794521 Consumer|0.252054794521
> > Enterprise|0.167123287671",
> >
> >         "payload(industry_value, 'internet services', 0)":*0.91780823*},
> >
> > {
> >         "industry_value":"Startup|8.55068493151 Media and
> > Entertainment|5.54794520548 Transportation|5.54794520548
> > Ticketing|5.54794520548 Travel|5.54794520548 Travel and
> > Tourism|5.54794520548 Events|5.54794520548 Cloud Computing|2.33698630137
> > Collaboration|2.33698630137 Platforms|2.33698630137
> > Enterprise|2.33698630137 Internet Services|2.33698630137 Top
> > Companies|2.33698630137 Developer Tools|2.33698630137 Operating
> > Systems|2.33698630137 Search|1.83287671233 Internet|1.83287671233
> > Technology|1.83287671233 Portals|1.83287671233 Email|1.83287671233
> > Photography|1.83287671233",
> >
> >         "payload(industry_value, 'internet services', 0)":*2.3369863*},
> >
> >
> > [1] https://lucidworks.com/2017/09/14/solr-payloads/
>

Re: Sort by payload value

Posted by Erick Erickson <er...@gmail.com>.
My first guess (and it's a total guess) is that you either have a case
problem or
you're tokenizing the string. Does your field definition lower-case the tokens?
If it's a string type then certainly not.

Quick test would be to try your query with a value that matches case
and has no spaces,
maybe "Portals". If that gives you the correct sort then you have a
place to start....

Adding &debug=query will help a bit, although it won't show you the
guts of the payload
calcs.

FYI, ties are broken by the internal Lucene doc ID. If the theory that
you are getting
no matches, then your sort order is determined by this value which you
don't really
have much access to.....

Best,
Erick

On Thu, May 24, 2018 at 7:29 PM, John Davis <jo...@gmail.com> wrote:
> Hello,
>
> We are trying to use payload values as described in [1] and are running
> into issues when issuing *sort by* payload value.  Would appreciate any
> pointers to what we might be doing wrong. We are running solr 6.6.0.
>
> * Here's the payload value definition:
>
>    <fieldType name="delimited_payloads_float" stored="true" indexed="true"
> class="solr.TextField">
>       <analyzer>
>           <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[A-Za-z0-9][^|]*[|][0-9.]+" group="0"/>
>           <filter class="solr.DelimitedPayloadTokenFilterFactory"
> encoder="float"/>
>           <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>   </fieldType>
>
> * Query with sort by does not return documents sorted by the payload value:
>
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":82,
>     "params":{
>       "q":"*:*",
>       "indent":"on",
>       "fl":"industry_value,${indexp}",
> *      "indexp":"payload(industry_value, 'internet services', 0)",*
>       "fq":["{!frange l=0.1}${indexp}",
>         "industry_value:*"],
> *      "sort":"${indexp} asc",*
>       "rows":"10",
>       "wt":"json"}},
>   "response":{"numFound":102668,"start":0,"docs":[
>       {
>         "industry_value":"Startup|13.3890410959 Collaboration|12.3863013699
> Document Management|12.3863013699 Chat|12.3863013699 Video
> Conferencing|12.3863013699 Finance|1.0 Payments|1.0 Internet|1.0 Internet
> Services|1.0 Top Companies|1.0",
>
>         "payload(industry_value, 'internet services', 0)":*1.0*},
>
>       {
>         "industry_value":"Hardware|16.7616438356 Messaging and
> Telecommunications|6.71780821918 Mobility|6.71780821918
> Startup|6.71780821918 Analytics|6.71780821918 Development
> Platforms|6.71780821918 Mobile Commerce|6.71780821918 Mobile
> Security|6.71780821918 Privacy and Security|6.71780821918 Information
> Security|6.71780821918 Cyber Security|6.71780821918 Finance|6.71780821918
> Collaboration|6.71780821918 Enterprise|6.71780821918
> Messaging|6.71780821918 Internet Services|6.71780821918 Information
> Technology|6.71780821918 Contact Management|6.71780821918
> Mobile|6.71780821918 Mobile Enterprise|6.71780821918 Data
> Security|6.71780821918 Data and Analytics|6.71780821918
> Security|6.71780821918",
>
>         "payload(industry_value, 'internet services', 0)":*6.7178082*},
>
>       {
>         "industry_value":"Startup|4.46301369863 Advertising|1.24657534247
> Content and Publishing|0.917808219178 Internet|0.917808219178 Social Media
> Platforms|0.917808219178 Content Discovery|0.917808219178 Media and
> Entertainment|0.917808219178 Social Media|0.917808219178 Sales and
> Marketing|0.917808219178 Internet Services|0.917808219178 Advertising
> Platforms|0.917808219178 Social Media Management|0.917808219178
> Mobile|0.328767123288 Food and Beverage|0.252054794521 Real
> Estate|0.252054794521 Consumer Goods|0.252054794521 FMCG|0.252054794521
> Home Services|0.252054794521 Consumer|0.252054794521
> Enterprise|0.167123287671",
>
>         "payload(industry_value, 'internet services', 0)":*0.91780823*},
>
> {
>         "industry_value":"Startup|8.55068493151 Media and
> Entertainment|5.54794520548 Transportation|5.54794520548
> Ticketing|5.54794520548 Travel|5.54794520548 Travel and
> Tourism|5.54794520548 Events|5.54794520548 Cloud Computing|2.33698630137
> Collaboration|2.33698630137 Platforms|2.33698630137
> Enterprise|2.33698630137 Internet Services|2.33698630137 Top
> Companies|2.33698630137 Developer Tools|2.33698630137 Operating
> Systems|2.33698630137 Search|1.83287671233 Internet|1.83287671233
> Technology|1.83287671233 Portals|1.83287671233 Email|1.83287671233
> Photography|1.83287671233",
>
>         "payload(industry_value, 'internet services', 0)":*2.3369863*},
>
>
> [1] https://lucidworks.com/2017/09/14/solr-payloads/