You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Maciej Ł. PCSS" <la...@man.poznan.pl> on 2017/02/15 08:25:22 UTC

Indexing of documents in more than one step (SOLRJ)

Dear All,
how should I handle the following scenario using SOLRJ?  Index a 
collection of documents (fill fields a, b, c). Then index the same 
collection but this time fill fields d, e, f.

In a pseudo-code it would be: step1(collectionX); step2(collectionX); 
solrCommit();

See my observations below:
- first step is done by calling SolrInputDocument.addField(fieldName, 
value); and this works fine.
- if I do the same for the second step then all fields in my documents 
get removed;
- for that reason I need to call SolrInputDocument.addField(fieldName, 
Collections.singletonMap("set", value)); and then it's fine
- but for some field, if I do the call from above, then the indexed 
values are like "{set=value}" instead of just "value".

Can somebody explain this strange behaviour to me?

Regards
Maciej


Re: Indexing of documents in more than one step (SOLRJ)

Posted by Erick Erickson <er...@gmail.com>.
Maciej:

you really have two choices:
1> re-index the entire document with fields a, b, c, d, e, f. In that
case though, why bother indexing the first time ;)
2> use Atomic Updates:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
but note the restrictions.

Best,
Erick

On Wed, Feb 15, 2017 at 3:45 AM, Emir Arnautovic
<em...@sematext.com> wrote:
> Which version of Solr do you use? Is it always the same field? Again,
> without checking anything, see if it could be that field is not multivalue
> and your value is.
>
> In any case, this is inefficient way of indexing. If possible, stream both
> sources ordered by ID and merge them in one input doc and send to Solr.
>
> Emir
>
>
>
> On 15.02.2017 12:24, Maciej Ł. PCSS wrote:
>>
>> No, it's not the case. In both steps I'm indexing documents from the same
>> set of IDs (I mean the values of the 'id').
>>
>> Maciej
>>
>>
>> W dniu 15.02.2017 o 11:07, Emir Arnautovic pisze:
>>>
>>> I did not have time to test it or look at the code, but can you check if
>>> it could be the case when there is no document with a, b, c fields and you
>>> are trying to update it with d, e, f using partial update syntax.
>>>
>>> Emir
>>>
>>>
>>> On 15.02.2017 09:25, Maciej Ł. PCSS wrote:
>>>>
>>>> Dear All,
>>>> how should I handle the following scenario using SOLRJ?  Index a
>>>> collection of documents (fill fields a, b, c). Then index the same
>>>> collection but this time fill fields d, e, f.
>>>>
>>>> In a pseudo-code it would be: step1(collectionX); step2(collectionX);
>>>> solrCommit();
>>>>
>>>> See my observations below:
>>>> - first step is done by calling SolrInputDocument.addField(fieldName,
>>>> value); and this works fine.
>>>> - if I do the same for the second step then all fields in my documents
>>>> get removed;
>>>> - for that reason I need to call SolrInputDocument.addField(fieldName,
>>>> Collections.singletonMap("set", value)); and then it's fine
>>>> - but for some field, if I do the call from above, then the indexed
>>>> values are like "{set=value}" instead of just "value".
>>>>
>>>> Can somebody explain this strange behaviour to me?
>>>>
>>>> Regards
>>>> Maciej
>>>>
>>>
>>
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>

Re: Indexing of documents in more than one step (SOLRJ)

Posted by Emir Arnautovic <em...@sematext.com>.
Which version of Solr do you use? Is it always the same field? Again, 
without checking anything, see if it could be that field is not 
multivalue and your value is.

In any case, this is inefficient way of indexing. If possible, stream 
both sources ordered by ID and merge them in one input doc and send to 
Solr.

Emir


On 15.02.2017 12:24, Maciej \u0141. PCSS wrote:
> No, it's not the case. In both steps I'm indexing documents from the 
> same set of IDs (I mean the values of the 'id').
>
> Maciej
>
>
> W dniu 15.02.2017 o 11:07, Emir Arnautovic pisze:
>> I did not have time to test it or look at the code, but can you check 
>> if it could be the case when there is no document with a, b, c fields 
>> and you are trying to update it with d, e, f using partial update 
>> syntax.
>>
>> Emir
>>
>>
>> On 15.02.2017 09:25, Maciej \u0141. PCSS wrote:
>>> Dear All,
>>> how should I handle the following scenario using SOLRJ?  Index a 
>>> collection of documents (fill fields a, b, c). Then index the same 
>>> collection but this time fill fields d, e, f.
>>>
>>> In a pseudo-code it would be: step1(collectionX); 
>>> step2(collectionX); solrCommit();
>>>
>>> See my observations below:
>>> - first step is done by calling 
>>> SolrInputDocument.addField(fieldName, value); and this works fine.
>>> - if I do the same for the second step then all fields in my 
>>> documents get removed;
>>> - for that reason I need to call 
>>> SolrInputDocument.addField(fieldName, 
>>> Collections.singletonMap("set", value)); and then it's fine
>>> - but for some field, if I do the call from above, then the indexed 
>>> values are like "{set=value}" instead of just "value".
>>>
>>> Can somebody explain this strange behaviour to me?
>>>
>>> Regards
>>> Maciej
>>>
>>
>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: Indexing of documents in more than one step (SOLRJ)

Posted by "Maciej Ł. PCSS" <la...@man.poznan.pl>.
No, it's not the case. In both steps I'm indexing documents from the 
same set of IDs (I mean the values of the 'id').

Maciej


W dniu 15.02.2017 o 11:07, Emir Arnautovic pisze:
> I did not have time to test it or look at the code, but can you check 
> if it could be the case when there is no document with a, b, c fields 
> and you are trying to update it with d, e, f using partial update syntax.
>
> Emir
>
>
> On 15.02.2017 09:25, Maciej \u0141. PCSS wrote:
>> Dear All,
>> how should I handle the following scenario using SOLRJ?  Index a 
>> collection of documents (fill fields a, b, c). Then index the same 
>> collection but this time fill fields d, e, f.
>>
>> In a pseudo-code it would be: step1(collectionX); step2(collectionX); 
>> solrCommit();
>>
>> See my observations below:
>> - first step is done by calling SolrInputDocument.addField(fieldName, 
>> value); and this works fine.
>> - if I do the same for the second step then all fields in my 
>> documents get removed;
>> - for that reason I need to call 
>> SolrInputDocument.addField(fieldName, Collections.singletonMap("set", 
>> value)); and then it's fine
>> - but for some field, if I do the call from above, then the indexed 
>> values are like "{set=value}" instead of just "value".
>>
>> Can somebody explain this strange behaviour to me?
>>
>> Regards
>> Maciej
>>
>


Re: Indexing of documents in more than one step (SOLRJ)

Posted by Emir Arnautovic <em...@sematext.com>.
I did not have time to test it or look at the code, but can you check if 
it could be the case when there is no document with a, b, c fields and 
you are trying to update it with d, e, f using partial update syntax.

Emir


On 15.02.2017 09:25, Maciej \u0141. PCSS wrote:
> Dear All,
> how should I handle the following scenario using SOLRJ?  Index a 
> collection of documents (fill fields a, b, c). Then index the same 
> collection but this time fill fields d, e, f.
>
> In a pseudo-code it would be: step1(collectionX); step2(collectionX); 
> solrCommit();
>
> See my observations below:
> - first step is done by calling SolrInputDocument.addField(fieldName, 
> value); and this works fine.
> - if I do the same for the second step then all fields in my documents 
> get removed;
> - for that reason I need to call SolrInputDocument.addField(fieldName, 
> Collections.singletonMap("set", value)); and then it's fine
> - but for some field, if I do the call from above, then the indexed 
> values are like "{set=value}" instead of just "value".
>
> Can somebody explain this strange behaviour to me?
>
> Regards
> Maciej
>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/