You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Hodder, Rick" <RH...@navg.com> on 2018/01/24 20:58:30 UTC

SOLR 7.1 queries not including empty fields in results

I am converting a SOLR 4.10 db to SOLR 7.1

It is NOT schemaless - so it uses a ClassicIndexSchemaFactory.

In 4.10, I have a field that is a phone number (here's the schema information for the field):

<field name="Phone" type="string" indexed="false" stored="true"/>

When inserting documents into SOLR, there are some documents where the value of Phone is an empty string or a single blank space.

When running a query against SOLR 4.10, the documents returned that have an empty or single space in Phone, include the phone field in the documents:

...
"FirstName":"Bob, No Phone",
"Phone":"",
"State":"WA"
...
"FirstName":"Sandy, No Phone",
"Phone":""
"State":"CA"
...
"FirstName":"Donald, With Phone",
"Phone":"123-123-1234",
"State":"NY"
...

But when these same rows are inserted into SOLR 7.1, the documents returned for those rows have no Phone field

...
"FirstName":"Bob, No Phone",
"State":"WA"
...
"FirstName":"Sandy, No Phone",
"State":"CA"
...
"FirstName":"Donald, With Phone",
"Phone":"123-123-1234",
"State":"NY"
...

See how Donald has a phone number because there was a "non-blank" phone number.

I also looked at the version of Java installed on the two boxes - the SOLR 4.10 box has java 1.8.0_161, and the SOLR 7.1 box has java 1.8.0_40. I wouldnt think the java version difference would cause that - I believe SOLR just requires 1.8.

Is this something that has been added since 4.10?

Is there a schema setting or SOLRConfig.xml or Schema.xml setting that can turn the 4.10 behavior back on?

Thanks,

Rick Hodder

RE: SOLR 7.1 queries not including empty fields in results

Posted by "Hodder, Rick" <RH...@navg.com>.
Hi Chris,

:Are you still using the same solrconfig.xml you had in 4.10, or did you switch to using a newer sample/default set (or in some other way
modified) solrconfig.xml?

:I ask because even if you are using the ClassicIndexSchemaFactory, your update processor chain might be using TrimFieldUpdateProcessorFactory and/or RemoveBlankFieldUpdateProcessorFactory ?

Right on the money - 

I started with a 7.1 solrconfig.xml slowly moved over settings from 4.10, so my solrconfig.xml had RemoveBlankFieldUpdateProcessorFactory configured in its updateProcessor - turned that off and now all is working as under 4.10 (better even)

Thanks!
Rick

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Wednesday, January 24, 2018 6:18 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 7.1 queries not including empty fields in results



:Are you still using the same solrconfig.xml you had in 4.10, or did you switch to using a newer sample/default set (or in some other way
modified) solrconfig.xml?

:I ask because even if you are using the ClassicIndexSchemaFactory, your update processor chain might be using TrimFieldUpdateProcessorFactory and/or RemoveBlankFieldUpdateProcessorFactory ?



Re: SOLR 7.1 queries not including empty fields in results

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/24/2018 4:17 PM, Chris Hostetter wrote:
> I ask because even if you are using the ClassicIndexSchemaFactory, your
> update processor chain might be using TrimFieldUpdateProcessorFactory
> and/or RemoveBlankFieldUpdateProcessorFactory ?
>
> When i use the sample techproducts configs in 7.1, I have no problem
> adding either an empty string or a bank space to a string field...

I ran into the same thing.  With the default example, versions from 5.0 
to 7.2 exhibit the "delete empty string fields" behavior, but when I 
tried sample_techproducts_configs, 7.2 behaved just like 4.10.  I 
located an issue:

https://issues.apache.org/jira/browse/SOLR-11855

I agree that there is likely an update processor chain active in 
solrconfig.xml that is deleting the field.

Thanks,
Shawn


Re: SOLR 7.1 queries not including empty fields in results

Posted by Chris Hostetter <ho...@fucit.org>.
: I am converting a SOLR 4.10 db to SOLR 7.1
: 
: It is NOT schemaless - so it uses a ClassicIndexSchemaFactory.
: 
: In 4.10, I have a field that is a phone number (here's the schema information for the field):
: 
: <field name="Phone" type="string" indexed="false" stored="true"/>
: 
: When inserting documents into SOLR, there are some documents where the 
: value of Phone is an empty string or a single blank space.
	... 
: But when these same rows are inserted into SOLR 7.1, the documents 
: returned for those rows have no Phone field

Are you still using the same solrconfig.xml you had in 4.10, or did you 
switch to using a newer sample/default set (or in some other way 
modified) solrconfig.xml?

I ask because even if you are using the ClassicIndexSchemaFactory, your 
update processor chain might be using TrimFieldUpdateProcessorFactory 
and/or RemoveBlankFieldUpdateProcessorFactory ?

When i use the sample techproducts configs in 7.1, I have no problem 
adding either an empty string or a bank space to a string field...



$ bin/solr -e techproducts
...
$ curl -H 'Content-Type: application/json' 'http://localhost:8983/solr/techproducts/update?commit=true' --data-binary '[{"id":"white","foo_s":" "},{"id":"blank","foo_s":""}]'
{
  "responseHeader":{
    "status":0,
    "QTime":40}}
$ curl 'http://localhost:8983/solr/techproducts/query?q=foo_s:*'
{
  "responseHeader":{
    "status":0,
    "QTime":12,
    "params":{
      "q":"foo_s:*"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"white",
        "foo_s":" ",
        "_version_":1590517543569719296},
      {
        "id":"blank",
        "foo_s":"",
        "_version_":1590517543570767872}]
  }}




-Hoss
http://www.lucidworks.com/