You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jie Luo <jl...@ebi.ac.uk> on 2019/05/04 10:57:34 UTC

Update documents cause multivalue fields unexpected behaviour

Dear solr user,

I have processes, first process is to build the SolrDocuments and indexes, other processes try to update the other fields of the solrdocuments and their indexes. Then I noticed that  when I search  previous indexed multivalue fields (not stored), it returns wrong results (fewer data). I tested with five documents, it will only return one documents with (field:*) search. Before I run the other processes, the behaviour is fine return 5 documents. However, it seems to me that single valued fields worked fine

Best Regards

Jie

Re: Update documents cause multivalue fields unexpected behaviour

Posted by Walter Underwood <wu...@wunderwood.org>.
We gather all the data for a document, then send it as one update to Solr.

Actually, we create a JSON object for each document, then make a JSONL (one JSON object per line) feed of everything we want to send. That gets compressed and saved in Amazon S3. Then we break it into batches and send it to Solr.

Putting the entire feed in S3 allows us to analyze that feed, load it into a test cluster, load yesterday’s feed, load it into a different prod cluster for disaster recovery, etc.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 4, 2019, at 3:57 AM, Jie Luo <jl...@ebi.ac.uk> wrote:
> 
> Dear solr user,
> 
> I have processes, first process is to build the SolrDocuments and indexes, other processes try to update the other fields of the solrdocuments and their indexes. Then I noticed that  when I search  previous indexed multivalue fields (not stored), it returns wrong results (fewer data). I tested with five documents, it will only return one documents with (field:*) search. Before I run the other processes, the behaviour is fine return 5 documents. However, it seems to me that single valued fields worked fine
> 
> Best Regards
> 
> Jie


Re: Update documents cause multivalue fields unexpected behaviour

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/7/2019 5:45 AM, Jie Luo wrote:
> For the fields that are set as stored true,  query works fine, but for fields that are set as stored false, the query does not work after the documents are updated.
> 
> SolrInputDocument solrInputDocument = new SolrInputDocument();
> 		solrInputDocument.addField(“id”,”somevalidId”);
> 
> Map<String, Object> fieldModifier = new HashMap<>(1);
> 			fieldModifier.put("set", “some value");
> 			solrInputDocument.addField(“aNewField", fieldModifier);

What you are doing there (using the "set" keyword in a Map object) is 
known as an Atomic Update.  That feature has some very strict 
requirements, and by setting "stored" on your field, you are violating 
those requirements.

In your schema, only copyField destinations can be stored=false.  In 
fact, those HAVE to be stored=false.  Everything else will need to have 
data retrievable in search results.

Here is a fuller description of what Atomic Updates requires:

https://lucene.apache.org/solr/guide/7_7/updating-parts-of-documents.html#field-storage

Side note, the relevance will make sense once you've read that entire 
section of the ref guide: Some field classes (TextField in particular) 
do not support docValues.

Thanks,
Shawn

Re: Update documents cause multivalue fields unexpected behaviour

Posted by Jie Luo <jl...@ebi.ac.uk>.
Hi all,

For the fields that are set as stored true,  query works fine, but for fields that are set as stored false, the query does not work after the documents are updated.

SolrInputDocument solrInputDocument = new SolrInputDocument();
		solrInputDocument.addField(“id”,”somevalidId”);

Map<String, Object> fieldModifier = new HashMap<>(1);
			fieldModifier.put("set", “some value");
			solrInputDocument.addField(“aNewField", fieldModifier);

Regards

Jie




> On 4 May 2019, at 14:25, Jörn Franke <jo...@gmail.com> wrote:
> 
> Hi,
> 
> Are you using atomic updates for your documents ? If not then if you change one value it will override the whole document.
> 
> Best regards
> 
>> Am 04.05.2019 um 12:57 schrieb Jie Luo <jl...@ebi.ac.uk>:
>> 
>> Dear solr user,
>> 
>> I have processes, first process is to build the SolrDocuments and indexes, other processes try to update the other fields of the solrdocuments and their indexes. Then I noticed that  when I search  previous indexed multivalue fields (not stored), it returns wrong results (fewer data). I tested with five documents, it will only return one documents with (field:*) search. Before I run the other processes, the behaviour is fine return 5 documents. However, it seems to me that single valued fields worked fine
>> 
>> Best Regards
>> 
>> Jie


Re: solr4 indexation taking too long time

Posted by Erick Erickson <er...@gmail.com>.
I suggest you contact Alfresco, as few in the Solr community know enough about what Alfresco has done to be much help.

Best,
Erick

> On May 4, 2019, at 6:38 AM, Theodore Ngogang <th...@afrilandfirstbank.com> wrote:
> 
> 
> Dear solr user,
> 
> we are migrating from alfresco 4.2 with apache lucene to alfresco 5.0 with solr4. we are facing and issue we have 700 GB of data ans the indexation process is not finish since more than 14 days, please help us to identify problem and solve it.
> thks!!


solr4 indexation taking too long time

Posted by Theodore Ngogang <th...@afrilandfirstbank.com>.
 
 Dear solr user,

we are migrating from alfresco 4.2 with apache lucene to alfresco 5.0 with solr4. we are facing and issue we have 700 GB of data ans the indexation process is not finish since more than 14 days, please help us to identify problem and solve it.
thks!!

Re: Update documents cause multivalue fields unexpected behaviour

Posted by Jörn Franke <jo...@gmail.com>.
Hi,

Are you using atomic updates for your documents ? If not then if you change one value it will override the whole document.

Best regards

> Am 04.05.2019 um 12:57 schrieb Jie Luo <jl...@ebi.ac.uk>:
> 
> Dear solr user,
> 
> I have processes, first process is to build the SolrDocuments and indexes, other processes try to update the other fields of the solrdocuments and their indexes. Then I noticed that  when I search  previous indexed multivalue fields (not stored), it returns wrong results (fewer data). I tested with five documents, it will only return one documents with (field:*) search. Before I run the other processes, the behaviour is fine return 5 documents. However, it seems to me that single valued fields worked fine
> 
> Best Regards
> 
> Jie