You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jean-Sebastien Vachon <je...@wantedanalytics.com> on 2014/04/09 20:04:44 UTC

Were changes made to facetting on multivalued fields recently?

Hi All,

We just discovered that the response from Solr (4.7.1) when faceting on one of our multi-valued fields has changed considerably.

In the past (4.6.1 and prior versions as well) we used to have something like this: (there are 7 possible values for this attribute)

<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="ad_job_type_id">
<int name="1">11454652</int>
<int name="4">11387070</int>
<int name="5">2095603</int>
<int name="3">809992</int>
<int name="2">567244</int>
<int name="6">139389</int>
<int name="7">4120</int>
</lst>
</lst>
<lst name="facet_dates"/>
</lst>

And now with 4.7.1 we are getting this:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="ad_job_type_id">
<int name="1">10954552</int>
<int name="4">10884418</int>
<int name="5">2000530</int>
<int name="3">784491</int>
<int name="2">535935</int>
<int name="4,1">134826</int>
<int name="5,1">11770</int>
... there are too many values to list them all ...

I checked the Change log for 4.7.1 and only saw an optimization made for https://issues.apache.org/jira/browse/SOLR-5512

Is there any new configuration directive that we should be aware of?

Thanks






RE: Were changes made to facetting on multivalued fields recently?

Posted by Jean-Sebastien Vachon <je...@wantedanalytics.com>.
Thanks to both of you. I finally found the issue and you were right (again) ;)

The problem was not coming from the full indexation code containing the SQL replace statement but from another process whose job is to maintain our index up to date. This process had no idea that commas were to be replaced by spaces for some fields (and it should not about this either).

I changed the Tokenizer used for the field to the following and everything is fine now.
    <tokenizer class="solr.PatternTokenizerFactory" pattern=","/>

Thanks for your help

> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: April-10-14 1:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Were changes made to facetting on multivalued fields recently?
> 
> bq: The SQL query contains a Replace statement that does this
> 
> Well, I suspect that's where the issue is. The facet values being reported
> include:
> <int name="4,1">134826</int>
> which indicates that the incoming text to Solr still has the commas.
> Solr is seeing the commas and all.
> 
> You can cure this by using PatternReplaceCharFilterFactory and doing the
> substitution at index time if you want to.
> 
> That doesn't clarify why the behavior has changed though, but my
> supposition is that it has nothing to do with Solr, and something about your
> SQL statement is different.
> 
> Best,
> Erick
> 
> On Thu, Apr 10, 2014 at 9:33 AM, Jean-Sebastien Vachon <jean-
> sebastien.vachon@wantedanalytics.com> wrote:
> > The SQL query contains a Replace statement that does this
> >
> >> -----Original Message-----
> >> From: Shawn Heisey [mailto:solr@elyograg.org]
> >> Sent: April-10-14 11:30 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Were changes made to facetting on multivalued fields
> recently?
> >>
> >> On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
> >> > Here are the field definitions for both our old and new index... as
> >> > you can
> >> see that are identical. We've been using this chain and field type
> >> starting with Solr 1.4 and never had any problem. As for the
> >> documents, both indexes are using the same data source. They could be
> >> slightly out of sync from time to time but we tend to index them on a
> >> daily basis. Both indexes are also using the same code (indexing through
> SolrJ) to index their content.
> >> >
> >> > The source is a column in MySql that contains entries such as "4,1"
> >> > that get stored in a Multivalued fields after replacing commas by
> >> > spaces
> >> >
> >> > OLD (4.6.1):
> >> >    <fieldType name="text_ws" class="solr.TextField"
> >> positionIncrementGap="100">
> >> >       <analyzer>
> >> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >       </analyzer>
> >> >     </fieldType>
> >> >
> >> >     <field name="ad_job_type_id" type="text_ws" indexed="true"
> >> > stored="true" required="false" multiValued="true" />
> >>
> >> Just so you know, there's nothing here that would require the field
> >> to be multivalued.  WhitespaceTokenizerFactory does not create
> >> multiple field values, it creates multiple terms.  If you are
> >> actually inserting multiple values for the field in SolrJ, then you would
> need a multivalued field.
> >>
> >> What is replacing the commas with spaces?  I don't see anything here
> >> that would do that.  It sounds like that part of your indexing is not
> working.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >> -----
> >> Aucun virus trouvé dans ce message.
> >> Analyse effectuée par AVG - www.avg.fr
> >> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
> >> 09/04/2014
> 
> -----
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
> 09/04/2014

Re: Were changes made to facetting on multivalued fields recently?

Posted by Erick Erickson <er...@gmail.com>.
bq: The SQL query contains a Replace statement that does this

Well, I suspect that's where the issue is. The facet values being
reported include:
<int name="4,1">134826</int>
which indicates that the incoming text to Solr still has the commas.
Solr is seeing the commas and all.

You can cure this by using PatternReplaceCharFilterFactory and doing
the substitution at index time if you want to.

That doesn't clarify why the behavior has changed though, but my
supposition is that it has nothing to do with Solr, and something
about your SQL statement is different.

Best,
Erick

On Thu, Apr 10, 2014 at 9:33 AM, Jean-Sebastien Vachon
<je...@wantedanalytics.com> wrote:
> The SQL query contains a Replace statement that does this
>
>> -----Original Message-----
>> From: Shawn Heisey [mailto:solr@elyograg.org]
>> Sent: April-10-14 11:30 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Were changes made to facetting on multivalued fields recently?
>>
>> On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
>> > Here are the field definitions for both our old and new index... as you can
>> see that are identical. We've been using this chain and field type starting with
>> Solr 1.4 and never had any problem. As for the documents, both indexes are
>> using the same data source. They could be slightly out of sync from time to
>> time but we tend to index them on a daily basis. Both indexes are also using
>> the same code (indexing through SolrJ) to index their content.
>> >
>> > The source is a column in MySql that contains entries such as "4,1"
>> > that get stored in a Multivalued fields after replacing commas by
>> > spaces
>> >
>> > OLD (4.6.1):
>> >    <fieldType name="text_ws" class="solr.TextField"
>> positionIncrementGap="100">
>> >       <analyzer>
>> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >       </analyzer>
>> >     </fieldType>
>> >
>> >     <field name="ad_job_type_id" type="text_ws" indexed="true"
>> > stored="true" required="false" multiValued="true" />
>>
>> Just so you know, there's nothing here that would require the field to be
>> multivalued.  WhitespaceTokenizerFactory does not create multiple field
>> values, it creates multiple terms.  If you are actually inserting multiple values
>> for the field in SolrJ, then you would need a multivalued field.
>>
>> What is replacing the commas with spaces?  I don't see anything here that
>> would do that.  It sounds like that part of your indexing is not working.
>>
>> Thanks,
>> Shawn
>>
>>
>> -----
>> Aucun virus trouvé dans ce message.
>> Analyse effectuée par AVG - www.avg.fr
>> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
>> 09/04/2014

RE: Were changes made to facetting on multivalued fields recently?

Posted by Jean-Sebastien Vachon <je...@wantedanalytics.com>.
The SQL query contains a Replace statement that does this

> -----Original Message-----
> From: Shawn Heisey [mailto:solr@elyograg.org]
> Sent: April-10-14 11:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Were changes made to facetting on multivalued fields recently?
> 
> On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
> > Here are the field definitions for both our old and new index... as you can
> see that are identical. We've been using this chain and field type starting with
> Solr 1.4 and never had any problem. As for the documents, both indexes are
> using the same data source. They could be slightly out of sync from time to
> time but we tend to index them on a daily basis. Both indexes are also using
> the same code (indexing through SolrJ) to index their content.
> >
> > The source is a column in MySql that contains entries such as "4,1"
> > that get stored in a Multivalued fields after replacing commas by
> > spaces
> >
> > OLD (4.6.1):
> >    <fieldType name="text_ws" class="solr.TextField"
> positionIncrementGap="100">
> >       <analyzer>
> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >       </analyzer>
> >     </fieldType>
> >
> >     <field name="ad_job_type_id" type="text_ws" indexed="true"
> > stored="true" required="false" multiValued="true" />
> 
> Just so you know, there's nothing here that would require the field to be
> multivalued.  WhitespaceTokenizerFactory does not create multiple field
> values, it creates multiple terms.  If you are actually inserting multiple values
> for the field in SolrJ, then you would need a multivalued field.
> 
> What is replacing the commas with spaces?  I don't see anything here that
> would do that.  It sounds like that part of your indexing is not working.
> 
> Thanks,
> Shawn
> 
> 
> -----
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
> 09/04/2014

Re: Were changes made to facetting on multivalued fields recently?

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
> Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content.
> 
> The source is a column in MySql that contains entries such as "4,1" that get stored in a Multivalued fields after replacing commas by spaces
> 
> OLD (4.6.1):
>    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       </analyzer>
>     </fieldType>
> 
>     <field name="ad_job_type_id" type="text_ws" indexed="true" stored="true" required="false" multiValued="true" />

Just so you know, there's nothing here that would require the field to
be multivalued.  WhitespaceTokenizerFactory does not create multiple
field values, it creates multiple terms.  If you are actually inserting
multiple values for the field in SolrJ, then you would need a
multivalued field.

What is replacing the commas with spaces?  I don't see anything here
that would do that.  It sounds like that part of your indexing is not
working.

Thanks,
Shawn


RE: Were changes made to facetting on multivalued fields recently?

Posted by Jean-Sebastien Vachon <je...@wantedanalytics.com>.
Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content.

The source is a column in MySql that contains entries such as "4,1" that get stored in a Multivalued fields after replacing commas by spaces

OLD (4.6.1):
   <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

    <field name="ad_job_type_id" type="text_ws" indexed="true" stored="true" required="false" multiValued="true" />

NEW (4.7.1):

<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
 </fieldType>

<field name="ad_job_type_id" type="text_ws" indexed="true" stored="true" required="false" multiValued="true" />

It looks like the /analysis/field hanlder is not active in our current setup. I will look into this and perform additional checks later as we are currently doing a full reindex of our DB.

Thanks for your time

> -----Original Message-----
> From: Shawn Heisey [mailto:solr@elyograg.org]
> Sent: April-09-14 5:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Were changes made to facetting on multivalued fields recently?
> 
> On 4/9/2014 2:15 PM, Erick Erickson wrote:
> > Right, but the response in the doc when you make a request is almost,
> > but not quite totally, unrelated to how facet values are tallied. It's
> > all about what tokens are actually in your index, which you can see in
> > the "schema browser"...
> 
> Supplement to what Erick has told you:
> 
> SOLR-5512 seems to be related to facets using docValues. The commit for
> that issue looks like it only touches on that specifically.If you do not have
> (and never have had) docValues on this field, then SOLR-5512 should not
> apply.
> 
> I am reasonably sure that for facets on fields with docValues, your facets
> would reflect the *stored* information, not the indexed information.
> 
> Finally, I don't think that docValues work on fieldtypes whose class is
> solr.TextField, which is the only class that can have an analysis chain that
> would turn "4 5 1" into three separate tokens.  The response that you shared
> where the value is "4 5 1" looks like there is only one value in the field -- so
> for that document, it is effectively the same as one that is single-valued.
> 
> Bottom line: It looks like either your analysis chain is working differently in
> the newer version, or you have documents in your newer index that are not
> in the older one.  Can you share the field and fieldType definitions from both
> versions?  Did your luceneMatchVersion change with the upgrade?  If you are
> using DIH to populate your index, can you also share your DIH config?
> 
> Thanks,
> Shawn
> 
> 
> -----
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date:
> 27/03/2014 La Base de données des virus a expiré.

Re: Were changes made to facetting on multivalued fields recently?

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/9/2014 2:15 PM, Erick Erickson wrote:
> Right, but the response in the doc when you make a request is almost,
> but not quite totally, unrelated to how facet values are tallied. It's
> all about what tokens are actually in your index, which you can see in
> the "schema browser"...

Supplement to what Erick has told you:

SOLR-5512 seems to be related to facets using docValues. The commit for 
that issue looks like it only touches on that specifically.If you do not 
have (and never have had) docValues on this field, then SOLR-5512 should 
not apply.

I am reasonably sure that for facets on fields with docValues, your 
facets would reflect the *stored* information, not the indexed information.

Finally, I don't think that docValues work on fieldtypes whose class is 
solr.TextField, which is the only class that can have an analysis chain 
that would turn "4 5 1" into three separate tokens.  The response that 
you shared where the value is "4 5 1" looks like there is only one value 
in the field -- so for that document, it is effectively the same as one 
that is single-valued.

Bottom line: It looks like either your analysis chain is working 
differently in the newer version, or you have documents in your newer 
index that are not in the older one.  Can you share the field and 
fieldType definitions from both versions?  Did your luceneMatchVersion 
change with the upgrade?  If you are using DIH to populate your index, 
can you also share your DIH config?

Thanks,
Shawn


Re: Were changes made to facetting on multivalued fields recently?

Posted by Erick Erickson <er...@gmail.com>.
Right, but the response in the doc when you make a request is almost,
but not quite totally, unrelated to how facet values are tallied. It's
all about what tokens are actually in your index, which you can see in
the "schema browser"...

Let me know what the results are
Erick

On Wed, Apr 9, 2014 at 11:40 AM, Jean-Sebastien Vachon
<je...@wantedanalytics.com> wrote:
> Thanks Erick I will check this as soon as I can.
>
> In the meantime, here is a sample query and how it looks in our index. It looks good to me (at least that what is showing up as well in our other and older indexes)
>
> http://10.0.5.227:8201/solr/Current/select?q=*:*&fl=ad_job_type_id&fq=ad_job_type_id:[*%20TO%20*]&facet=on&facet.field=ad_job_type_id&rows=1
>
> <result name="response" numFound="12204004" start="0" maxScore="1.0">
>  <doc>
>    <arr name="ad_job_type_id">
>        <str>4 5 1</str>
>     </arr>
>   </doc>
> </result>
>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: April-09-14 2:21 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Were changes made to facetting on multivalued fields recently?
>>
>> That is...um...very strange. It looks to me like you have somehow indexed a
>> bunch of new values. I'm guessing here, but it's suspicious that you have a
>> value "4,1" should that have been indexed as "4" and "1" as separate tokens?
>>
>> So here's what I'd do
>> 1> take a look at the solr/admin/schema browser output for that field
>> in the two versions. I suspect you'll see 7 values in 4.6 and a bazillion in 4.7.1.
>> 2> if <1> is true, take a look at the admin/analysis page for the
>> field in question and see some sample index-time inputs, especially for the
>> theoretical "4,1" entries. I suspect that 4.6 will break these up into two
>> tokens and 4.7.1 won't.
>> 3> if <2> is true, take a very careful look at the index-time analysis
>> chains in the two versions, I bet they're different and that accounts for your
>> observations.
>> 4> try 1-3, discover I'm totally off base and paste the schema.xml
>> definitions for the field in question in both 4.6 and 4.7.1 to this thread and
>> we can take a look.
>>
>> This should not have changed between 4.6 and 4.7.1, at least not
>> intentionally.
>>
>> Best,
>> Erick
>>
>> On Wed, Apr 9, 2014 at 11:04 AM, Jean-Sebastien Vachon <jean-
>> sebastien.vachon@wantedanalytics.com> wrote:
>> > Hi All,
>> >
>> > We just discovered that the response from Solr (4.7.1) when faceting on
>> one of our multi-valued fields has changed considerably.
>> >
>> > In the past (4.6.1 and prior versions as well) we used to have
>> > something like this: (there are 7 possible values for this attribute)
>> >
>> > <lst name="facet_counts">
>> > <lst name="facet_queries"/>
>> > <lst name="facet_fields">
>> > <lst name="ad_job_type_id">
>> > <int name="1">11454652</int>
>> > <int name="4">11387070</int>
>> > <int name="5">2095603</int>
>> > <int name="3">809992</int>
>> > <int name="2">567244</int>
>> > <int name="6">139389</int>
>> > <int name="7">4120</int>
>> > </lst>
>> > </lst>
>> > <lst name="facet_dates"/>
>> > </lst>
>> >
>> > And now with 4.7.1 we are getting this:
>> > <lst name="facet_counts">
>> > <lst name="facet_queries"/>
>> > <lst name="facet_fields">
>> > <lst name="ad_job_type_id">
>> > <int name="1">10954552</int>
>> > <int name="4">10884418</int>
>> > <int name="5">2000530</int>
>> > <int name="3">784491</int>
>> > <int name="2">535935</int>
>> > <int name="4,1">134826</int>
>> > <int name="5,1">11770</int>
>> > ... there are too many values to list them all ...
>> >
>> > I checked the Change log for 4.7.1 and only saw an optimization made
>> > for https://issues.apache.org/jira/browse/SOLR-5512
>> >
>> > Is there any new configuration directive that we should be aware of?
>> >
>> > Thanks
>> >
>> >
>> >
>> >
>> >
>>
>> -----
>> Aucun virus trouvé dans ce message.
>> Analyse effectuée par AVG - www.avg.fr
>> Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date:
>> 27/03/2014 La Base de données des virus a expiré.

RE: Were changes made to facetting on multivalued fields recently?

Posted by Jean-Sebastien Vachon <je...@wantedanalytics.com>.
Thanks Erick I will check this as soon as I can.

In the meantime, here is a sample query and how it looks in our index. It looks good to me (at least that what is showing up as well in our other and older indexes)

http://10.0.5.227:8201/solr/Current/select?q=*:*&fl=ad_job_type_id&fq=ad_job_type_id:[*%20TO%20*]&facet=on&facet.field=ad_job_type_id&rows=1

<result name="response" numFound="12204004" start="0" maxScore="1.0">
 <doc>
   <arr name="ad_job_type_id">
       <str>4 5 1</str>
    </arr>
  </doc>
</result>

> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: April-09-14 2:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Were changes made to facetting on multivalued fields recently?
> 
> That is...um...very strange. It looks to me like you have somehow indexed a
> bunch of new values. I'm guessing here, but it's suspicious that you have a
> value "4,1" should that have been indexed as "4" and "1" as separate tokens?
> 
> So here's what I'd do
> 1> take a look at the solr/admin/schema browser output for that field
> in the two versions. I suspect you'll see 7 values in 4.6 and a bazillion in 4.7.1.
> 2> if <1> is true, take a look at the admin/analysis page for the
> field in question and see some sample index-time inputs, especially for the
> theoretical "4,1" entries. I suspect that 4.6 will break these up into two
> tokens and 4.7.1 won't.
> 3> if <2> is true, take a very careful look at the index-time analysis
> chains in the two versions, I bet they're different and that accounts for your
> observations.
> 4> try 1-3, discover I'm totally off base and paste the schema.xml
> definitions for the field in question in both 4.6 and 4.7.1 to this thread and
> we can take a look.
> 
> This should not have changed between 4.6 and 4.7.1, at least not
> intentionally.
> 
> Best,
> Erick
> 
> On Wed, Apr 9, 2014 at 11:04 AM, Jean-Sebastien Vachon <jean-
> sebastien.vachon@wantedanalytics.com> wrote:
> > Hi All,
> >
> > We just discovered that the response from Solr (4.7.1) when faceting on
> one of our multi-valued fields has changed considerably.
> >
> > In the past (4.6.1 and prior versions as well) we used to have
> > something like this: (there are 7 possible values for this attribute)
> >
> > <lst name="facet_counts">
> > <lst name="facet_queries"/>
> > <lst name="facet_fields">
> > <lst name="ad_job_type_id">
> > <int name="1">11454652</int>
> > <int name="4">11387070</int>
> > <int name="5">2095603</int>
> > <int name="3">809992</int>
> > <int name="2">567244</int>
> > <int name="6">139389</int>
> > <int name="7">4120</int>
> > </lst>
> > </lst>
> > <lst name="facet_dates"/>
> > </lst>
> >
> > And now with 4.7.1 we are getting this:
> > <lst name="facet_counts">
> > <lst name="facet_queries"/>
> > <lst name="facet_fields">
> > <lst name="ad_job_type_id">
> > <int name="1">10954552</int>
> > <int name="4">10884418</int>
> > <int name="5">2000530</int>
> > <int name="3">784491</int>
> > <int name="2">535935</int>
> > <int name="4,1">134826</int>
> > <int name="5,1">11770</int>
> > ... there are too many values to list them all ...
> >
> > I checked the Change log for 4.7.1 and only saw an optimization made
> > for https://issues.apache.org/jira/browse/SOLR-5512
> >
> > Is there any new configuration directive that we should be aware of?
> >
> > Thanks
> >
> >
> >
> >
> >
> 
> -----
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date:
> 27/03/2014 La Base de données des virus a expiré.

Re: Were changes made to facetting on multivalued fields recently?

Posted by Erick Erickson <er...@gmail.com>.
That is...um...very strange. It looks to me like you have somehow
indexed a bunch of new values. I'm guessing here, but it's suspicious
that you have a value "4,1" should that have been indexed as "4" and
"1" as separate tokens?

So here's what I'd do
1> take a look at the solr/admin/schema browser output for that field
in the two versions. I suspect you'll see 7 values in 4.6 and a
bazillion in 4.7.1.
2> if <1> is true, take a look at the admin/analysis page for the
field in question and see some sample index-time inputs, especially
for the theoretical "4,1" entries. I suspect that 4.6 will break these
up into two tokens and 4.7.1 won't.
3> if <2> is true, take a very careful look at the index-time analysis
chains in the two versions, I bet they're different and that accounts
for your observations.
4> try 1-3, discover I'm totally off base and paste the schema.xml
definitions for the field in question in both 4.6 and 4.7.1 to this
thread and we can take a look.

This should not have changed between 4.6 and 4.7.1, at least not intentionally.

Best,
Erick

On Wed, Apr 9, 2014 at 11:04 AM, Jean-Sebastien Vachon
<je...@wantedanalytics.com> wrote:
> Hi All,
>
> We just discovered that the response from Solr (4.7.1) when faceting on one of our multi-valued fields has changed considerably.
>
> In the past (4.6.1 and prior versions as well) we used to have something like this: (there are 7 possible values for this attribute)
>
> <lst name="facet_counts">
> <lst name="facet_queries"/>
> <lst name="facet_fields">
> <lst name="ad_job_type_id">
> <int name="1">11454652</int>
> <int name="4">11387070</int>
> <int name="5">2095603</int>
> <int name="3">809992</int>
> <int name="2">567244</int>
> <int name="6">139389</int>
> <int name="7">4120</int>
> </lst>
> </lst>
> <lst name="facet_dates"/>
> </lst>
>
> And now with 4.7.1 we are getting this:
> <lst name="facet_counts">
> <lst name="facet_queries"/>
> <lst name="facet_fields">
> <lst name="ad_job_type_id">
> <int name="1">10954552</int>
> <int name="4">10884418</int>
> <int name="5">2000530</int>
> <int name="3">784491</int>
> <int name="2">535935</int>
> <int name="4,1">134826</int>
> <int name="5,1">11770</int>
> ... there are too many values to list them all ...
>
> I checked the Change log for 4.7.1 and only saw an optimization made for https://issues.apache.org/jira/browse/SOLR-5512
>
> Is there any new configuration directive that we should be aware of?
>
> Thanks
>
>
>
>
>