You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Naomi Dushay <nd...@stanford.edu> on 2012/02/23 19:56:39 UTC

autoGeneratePhraseQueries sort of silently set to false

Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words:   aaa-bbb.   Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml.  This is a great feature, and everything is peachy if you start with Solr 3.4.   But many of us started earlier and are upgrading, and that's a different story.

It was surprising to me that

a.  the default for this new feature caused different search results than Solr 1.4 

b.  it wasn't documented clearly, IMO

http://wiki.apache.org/solr/SchemaXml   makes no mention of it


In the schema.xml example, there is this at the top:

<!-- attribute "name" is the name of this schema and is only used for display purposes.
       Applications should change this to reflect the nature of the search collection.
       version="1.4" is Solr's version number for the schema syntax and semantics.  It should
       not normally be changed by applications.
       1.0: multiValued attribute did not exist, all fields are multiValued by nature
       1.1: multiValued attribute introduced, false by default 
       1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields.
       1.3: removed optional field compress feature
       1.4: default auto-phrase (QueryParser feature) to off
     -->

And there was this in a couple of field definitions:

<fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">

But that was it.


RE: autoGeneratePhraseQueries sort of silently set to false

Posted by "Burton-West, Tom" <tb...@umich.edu>.
Thanks Erik,

The 3.1 changes document the ability to set this and the default being set to "true"
However apparently the change between 3.4 and 3.5 the default was set to "false"  
Since this will change the behavior of any field where autoGeneratePhraseQueries is not explicitly set, it could easily surprise users who update to 3.5. 

 That's why I think the changing of the default behavior (i.e. when not explicitly set) should be called out explicitly in the changes.txt for 3.5.   

True, everyone should read the notes in the example schema.xml, but I think it would help if the change was also noted in changes.txt.  

Is it possible to revise the changes.txt for 3.5?

Do you by any chance know where the change in the default behavior was discussed?  I know it has been a contentious issue.

Tom

-----Original Message-----
From: Erik Hatcher [mailto:erik.hatcher@gmail.com] 
Sent: Thursday, February 23, 2012 2:53 PM
To: solr-user@lucene.apache.org
Subject: Re: autoGeneratePhraseQueries sort of silently set to false

there's this (for 3.1, but in the 3.x CHANGES.txt):

* SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
  autoGeneratePhraseQueries="true" (the default) causes the query parser to
  generate phrase queries if multiple tokens are generated from a single
  non-quoted analysis string.  For example WordDelimiterFilter splitting text:pdp-11
  will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11).
  Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace
  delimited languages. (yonik)

with a ton of useful, though back and forth, commentary here: <https://issues.apache.org/jira/browse/SOLR-2015>

Note that the behavior, as Naomi pointed out so succinctly, is adjustable based off the *schema* version setting.  (look at your <schema> line in schema.xml).  The code is simply this:

    if (schema.getVersion() > 1.3f) {
      autoGeneratePhraseQueries = false;
    } else {
      autoGeneratePhraseQueries = true;
    }

on TextField.  Specifying autoGeneratePhraseQueries explicitly on a field type overrides whatever the default may be.

	Erik



On Feb 23, 2012, at 14:45 , Burton-West, Tom wrote:

> Seems like a change in default behavior like this should be included in the changes.txt for Solr 3.5.
> Not sure how to do that.
> 
> Tom
> 
> -----Original Message-----
> From: Naomi Dushay [mailto:ndushay@stanford.edu] 
> Sent: Thursday, February 23, 2012 1:57 PM
> To: solr-user@lucene.apache.org
> Subject: autoGeneratePhraseQueries sort of silently set to false 
> 
> Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words:   aaa-bbb.   Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml.  This is a great feature, and everything is peachy if you start with Solr 3.4.   But many of us started earlier and are upgrading, and that's a different story.
> 
> It was surprising to me that
> 
> a.  the default for this new feature caused different search results than Solr 1.4 
> 
> b.  it wasn't documented clearly, IMO
> 
> http://wiki.apache.org/solr/SchemaXml   makes no mention of it
> 
> 
> In the schema.xml example, there is this at the top:
> 
> <!-- attribute "name" is the name of this schema and is only used for display purposes.
>       Applications should change this to reflect the nature of the search collection.
>       version="1.4" is Solr's version number for the schema syntax and semantics.  It should
>       not normally be changed by applications.
>       1.0: multiValued attribute did not exist, all fields are multiValued by nature
>       1.1: multiValued attribute introduced, false by default 
>       1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields.
>       1.3: removed optional field compress feature
>       1.4: default auto-phrase (QueryParser feature) to off
>     -->
> 
> And there was this in a couple of field definitions:
> 
> <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
> 
> But that was it.
> 


Re: autoGeneratePhraseQueries sort of silently set to false

Posted by Erik Hatcher <er...@gmail.com>.
there's this (for 3.1, but in the 3.x CHANGES.txt):

* SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
  autoGeneratePhraseQueries="true" (the default) causes the query parser to
  generate phrase queries if multiple tokens are generated from a single
  non-quoted analysis string.  For example WordDelimiterFilter splitting text:pdp-11
  will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11).
  Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace
  delimited languages. (yonik)

with a ton of useful, though back and forth, commentary here: <https://issues.apache.org/jira/browse/SOLR-2015>

Note that the behavior, as Naomi pointed out so succinctly, is adjustable based off the *schema* version setting.  (look at your <schema> line in schema.xml).  The code is simply this:

    if (schema.getVersion() > 1.3f) {
      autoGeneratePhraseQueries = false;
    } else {
      autoGeneratePhraseQueries = true;
    }

on TextField.  Specifying autoGeneratePhraseQueries explicitly on a field type overrides whatever the default may be.

	Erik



On Feb 23, 2012, at 14:45 , Burton-West, Tom wrote:

> Seems like a change in default behavior like this should be included in the changes.txt for Solr 3.5.
> Not sure how to do that.
> 
> Tom
> 
> -----Original Message-----
> From: Naomi Dushay [mailto:ndushay@stanford.edu] 
> Sent: Thursday, February 23, 2012 1:57 PM
> To: solr-user@lucene.apache.org
> Subject: autoGeneratePhraseQueries sort of silently set to false 
> 
> Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words:   aaa-bbb.   Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml.  This is a great feature, and everything is peachy if you start with Solr 3.4.   But many of us started earlier and are upgrading, and that's a different story.
> 
> It was surprising to me that
> 
> a.  the default for this new feature caused different search results than Solr 1.4 
> 
> b.  it wasn't documented clearly, IMO
> 
> http://wiki.apache.org/solr/SchemaXml   makes no mention of it
> 
> 
> In the schema.xml example, there is this at the top:
> 
> <!-- attribute "name" is the name of this schema and is only used for display purposes.
>       Applications should change this to reflect the nature of the search collection.
>       version="1.4" is Solr's version number for the schema syntax and semantics.  It should
>       not normally be changed by applications.
>       1.0: multiValued attribute did not exist, all fields are multiValued by nature
>       1.1: multiValued attribute introduced, false by default 
>       1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields.
>       1.3: removed optional field compress feature
>       1.4: default auto-phrase (QueryParser feature) to off
>     -->
> 
> And there was this in a couple of field definitions:
> 
> <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
> 
> But that was it.
> 


RE: autoGeneratePhraseQueries sort of silently set to false

Posted by "Burton-West, Tom" <tb...@umich.edu>.
Seems like a change in default behavior like this should be included in the changes.txt for Solr 3.5.
Not sure how to do that.

Tom

-----Original Message-----
From: Naomi Dushay [mailto:ndushay@stanford.edu] 
Sent: Thursday, February 23, 2012 1:57 PM
To: solr-user@lucene.apache.org
Subject: autoGeneratePhraseQueries sort of silently set to false 

Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words:   aaa-bbb.   Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml.  This is a great feature, and everything is peachy if you start with Solr 3.4.   But many of us started earlier and are upgrading, and that's a different story.

It was surprising to me that

a.  the default for this new feature caused different search results than Solr 1.4 

b.  it wasn't documented clearly, IMO

http://wiki.apache.org/solr/SchemaXml   makes no mention of it


In the schema.xml example, there is this at the top:

<!-- attribute "name" is the name of this schema and is only used for display purposes.
       Applications should change this to reflect the nature of the search collection.
       version="1.4" is Solr's version number for the schema syntax and semantics.  It should
       not normally be changed by applications.
       1.0: multiValued attribute did not exist, all fields are multiValued by nature
       1.1: multiValued attribute introduced, false by default 
       1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields.
       1.3: removed optional field compress feature
       1.4: default auto-phrase (QueryParser feature) to off
     -->

And there was this in a couple of field definitions:

<fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">

But that was it.