You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Hodder, Rick" <RH...@navg.com> on 2018/01/24 20:58:30 UTC
SOLR 7.1 queries not including empty fields in results
I am converting a SOLR 4.10 db to SOLR 7.1
It is NOT schemaless - so it uses a ClassicIndexSchemaFactory.
In 4.10, I have a field that is a phone number (here's the schema information for the field):
<field name="Phone" type="string" indexed="false" stored="true"/>
When inserting documents into SOLR, there are some documents where the value of Phone is an empty string or a single blank space.
When running a query against SOLR 4.10, the documents returned that have an empty or single space in Phone, include the phone field in the documents:
...
"FirstName":"Bob, No Phone",
"Phone":"",
"State":"WA"
...
"FirstName":"Sandy, No Phone",
"Phone":""
"State":"CA"
...
"FirstName":"Donald, With Phone",
"Phone":"123-123-1234",
"State":"NY"
...
But when these same rows are inserted into SOLR 7.1, the documents returned for those rows have no Phone field
...
"FirstName":"Bob, No Phone",
"State":"WA"
...
"FirstName":"Sandy, No Phone",
"State":"CA"
...
"FirstName":"Donald, With Phone",
"Phone":"123-123-1234",
"State":"NY"
...
See how Donald has a phone number because there was a "non-blank" phone number.
I also looked at the version of Java installed on the two boxes - the SOLR 4.10 box has java 1.8.0_161, and the SOLR 7.1 box has java 1.8.0_40. I wouldnt think the java version difference would cause that - I believe SOLR just requires 1.8.
Is this something that has been added since 4.10?
Is there a schema setting or SOLRConfig.xml or Schema.xml setting that can turn the 4.10 behavior back on?
Thanks,
Rick Hodder
RE: SOLR 7.1 queries not including empty fields in results
Posted by "Hodder, Rick" <RH...@navg.com>.
Hi Chris,
:Are you still using the same solrconfig.xml you had in 4.10, or did you switch to using a newer sample/default set (or in some other way
modified) solrconfig.xml?
:I ask because even if you are using the ClassicIndexSchemaFactory, your update processor chain might be using TrimFieldUpdateProcessorFactory and/or RemoveBlankFieldUpdateProcessorFactory ?
Right on the money -
I started with a 7.1 solrconfig.xml slowly moved over settings from 4.10, so my solrconfig.xml had RemoveBlankFieldUpdateProcessorFactory configured in its updateProcessor - turned that off and now all is working as under 4.10 (better even)
Thanks!
Rick
-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
Sent: Wednesday, January 24, 2018 6:18 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 7.1 queries not including empty fields in results
:Are you still using the same solrconfig.xml you had in 4.10, or did you switch to using a newer sample/default set (or in some other way
modified) solrconfig.xml?
:I ask because even if you are using the ClassicIndexSchemaFactory, your update processor chain might be using TrimFieldUpdateProcessorFactory and/or RemoveBlankFieldUpdateProcessorFactory ?
Re: SOLR 7.1 queries not including empty fields in results
Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/24/2018 4:17 PM, Chris Hostetter wrote:
> I ask because even if you are using the ClassicIndexSchemaFactory, your
> update processor chain might be using TrimFieldUpdateProcessorFactory
> and/or RemoveBlankFieldUpdateProcessorFactory ?
>
> When i use the sample techproducts configs in 7.1, I have no problem
> adding either an empty string or a bank space to a string field...
I ran into the same thing. With the default example, versions from 5.0
to 7.2 exhibit the "delete empty string fields" behavior, but when I
tried sample_techproducts_configs, 7.2 behaved just like 4.10. I
located an issue:
https://issues.apache.org/jira/browse/SOLR-11855
I agree that there is likely an update processor chain active in
solrconfig.xml that is deleting the field.
Thanks,
Shawn
Re: SOLR 7.1 queries not including empty fields in results
Posted by Chris Hostetter <ho...@fucit.org>.
: I am converting a SOLR 4.10 db to SOLR 7.1
:
: It is NOT schemaless - so it uses a ClassicIndexSchemaFactory.
:
: In 4.10, I have a field that is a phone number (here's the schema information for the field):
:
: <field name="Phone" type="string" indexed="false" stored="true"/>
:
: When inserting documents into SOLR, there are some documents where the
: value of Phone is an empty string or a single blank space.
...
: But when these same rows are inserted into SOLR 7.1, the documents
: returned for those rows have no Phone field
Are you still using the same solrconfig.xml you had in 4.10, or did you
switch to using a newer sample/default set (or in some other way
modified) solrconfig.xml?
I ask because even if you are using the ClassicIndexSchemaFactory, your
update processor chain might be using TrimFieldUpdateProcessorFactory
and/or RemoveBlankFieldUpdateProcessorFactory ?
When i use the sample techproducts configs in 7.1, I have no problem
adding either an empty string or a bank space to a string field...
$ bin/solr -e techproducts
...
$ curl -H 'Content-Type: application/json' 'http://localhost:8983/solr/techproducts/update?commit=true' --data-binary '[{"id":"white","foo_s":" "},{"id":"blank","foo_s":""}]'
{
"responseHeader":{
"status":0,
"QTime":40}}
$ curl 'http://localhost:8983/solr/techproducts/query?q=foo_s:*'
{
"responseHeader":{
"status":0,
"QTime":12,
"params":{
"q":"foo_s:*"}},
"response":{"numFound":2,"start":0,"docs":[
{
"id":"white",
"foo_s":" ",
"_version_":1590517543569719296},
{
"id":"blank",
"foo_s":"",
"_version_":1590517543570767872}]
}}
-Hoss
http://www.lucidworks.com/