You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Artem OXSEED <a....@oxseed.com> on 2013/03/13 11:43:48 UTC

Version conflict during data import from another Solr instance into clean Solr

Hi,

I've configured data import handler:

<requestHandler name="/dataimport" 
class="org.apache.solr.handler.dataimport.DataImportHandler">
     <lst name="defaults">
       <str name="config">data-config.xml</str>
     </lst>
   </requestHandler>

data-config.xml:

<dataConfig>
   <document>
     <entity name="sep" processor="SolrEntityProcessor"
     url="http://host:8080/index" query="*:*" wt="javabin"/>
   </document>
</dataConfig>

Both Solr instances are of the same version - 4.1. Target Solr instance 
is empty - no documents exist there.

During data import I see constant errors in console:

WARNING: Error creating document : SolrInputDocument[..., 
internal_id=2011042103204394408D878AC717F7FB21ABF9ECD011CB7ED, ..., 
_version_=1426404770097135617, ...]
org.apache.solr.common.SolrException: version conflict for 
2011042103204394408D878AC717F7FB21ABF9ECD011CB7ED 
expected=1426404770097135617 actual=-1
         at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:543)
         at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:350)
         at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
         at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
         at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:234)
         at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
         at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
         at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
         at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
         at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
         at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
         at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)

(internal_id field is a unique key)

No documents are therefore being imported. It's interesting to note that 
data in the source Solr were imported there by the same data import 
configuration from yet another Solr instance - only from an older 
version (with wt="xml" of course). That old data did not contain 
_version_ field (Solr 1.4.1) so this problem could not appear at all.

I've tried specifying a few fields in fl parameter of 
SolrEntityProcessor, without including _version_ field - it works. So I 
guess the last way would be to specify all document fields there 
excluding _version_ - but it's not convenient at all.

Any ideas on what might be a problem here?

-- 
Warm regards,
Artem Karpenko


Re: Version conflict during data import from another Solr instance into clean Solr

Posted by deansg <de...@gmail.com>.
Hi, I ran into the same problem. Chris' first solution worked for us, however
the second solution on its own doesn't work, as the conflict error arises
before the update processors' code is even reached. However, creating an
alias for the _version_ field in the dataconfig file, together with an
update processor that removes the temporary field (and possibly other
unwanted fields) seemed to work great for us.



--
View this message in context: http://lucene.472066.n3.nabble.com/Version-conflict-during-data-import-from-another-Solr-instance-into-clean-Solr-tp4046937p4331876.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Version conflict during data import from another Solr instance into clean Solr

Posted by Chris Hostetter <ho...@fucit.org>.
: It looks strange to me that if there is no document yet (foundVersion < 0)
: then the only case when document will be imported is when input version is
: negative. Guess I need to test specific cases using SolrJ or smth. to be sure.

you're assuming that if foundVersion < 0 that means no document *yet* ... 
it could also mean there was a document, and it's been deleted.

Either way if the client has said "(replace|update) version X of doc D" 
the code is failing because it can't: doc D does not exist with version 
X.  Regardless of whether someone deleted doc D, or replaced it it with a 
newer version, or it never existed i nthe first place, Solr can't do what 
you asked it to do.

: Anyway I'll also check if I can inherit from SolrEntityProcessor and override
: _version_ field there before insertion.

Easier solutions to consider (off the cuff, not tested)...

1) on in your SolrEntityProcessor, configure fl with something like this 
to alias the _version_ field to something else

   fl=*,old_version:_version_

2) configure your destination solr instance with an update chain that 
ignores the _version_ field (you wouldn't want this for most normal usage, 
but it would be suitable for thiese conds of from scratch imports from 
other solr instances)...

https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html



-Hoss

Re: Version conflict during data import from another Solr instance into clean Solr

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
What about update request processors to drop the field?

Regards,
     Alex
On Mar 13, 2013 9:45 AM, "Artem OXSEED" <a....@oxseed.com> wrote:

> Hello, thank you for response!
>
> Configuration option <semanticsMode> does not help - it's probably not yet
> implemented. I found however the line of code which checks versions:
>
> Long lastVersion = vinfo.lookupVersion(cmd.**getIndexedId());
> long foundVersion = lastVersion == null ? -1 : lastVersion;
> if ( versionOnUpdate == foundVersion || (versionOnUpdate < 0 &&
> foundVersion < 0) || (versionOnUpdate==1 && foundVersion > 0) )
> // we're ok if versions match, or if both are negative (all missing docs
> are equal), or if cmd
> // specified it must exist (versionOnUpdate==1) and it does.
> } else {throw...}
>
> It looks strange to me that if there is no document yet (foundVersion < 0)
> then the only case when document will be imported is when input version is
> negative. Guess I need to test specific cases using SolrJ or smth. to be
> sure.
>
> Anyway I'll also check if I can inherit from SolrEntityProcessor and
> override _version_ field there before insertion.
>
> --
> Warm regards,
> Artem Karpenko
>
> On 13.03.2013 14:18, Alexandre Rafalovitch wrote:
>
>> I believe you are running into the update semantics, new with Solr 4
>> (4.1?): https://wiki.apache.org/solr/**Per%20Steffensen/Update%**
>> 20semantics<https://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics>I
>> am not sure Wiki is 100% correct (especially on default mode), but it
>> should be good enough.
>>
>> Basically, because you are specifying some real value in _version_ field,
>> Solr assumes it is an update operation and expects the already stored
>> field
>> to have the same value (to avoid concurrent update issues).
>>
>> You probably want to temporarily switch to override mode, which is
>> 'classic' as described in the link above. Then, after indexing, you can
>> reset the configuration.
>>
>> Regards,
>>      Alex.
>>
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/**alexandrerafalovitch<http://www.linkedin.com/in/alexandrerafalovitch>
>> - Time is the quality of nature that keeps events from happening all at
>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>
>>
>> On Wed, Mar 13, 2013 at 6:43 AM, Artem OXSEED <a....@oxseed.com>
>> wrote:
>>
>>  Hi,
>>>
>>> I've configured data import handler:
>>>
>>> <requestHandler name="/dataimport" class="org.apache.solr.**
>>> handler.dataimport.****DataImportHandler">
>>>      <lst name="defaults">
>>>        <str name="config">data-config.xml<****/str>
>>>      </lst>
>>>    </requestHandler>
>>>
>>> data-config.xml:
>>>
>>> <dataConfig>
>>>    <document>
>>>      <entity name="sep" processor="****SolrEntityProcessor"
>>>      url="http://host:8080/index" query="*:*" wt="javabin"/>
>>>    </document>
>>> </dataConfig>
>>>
>>> Both Solr instances are of the same version - 4.1. Target Solr instance
>>> is
>>> empty - no documents exist there.
>>>
>>> During data import I see constant errors in console:
>>>
>>> WARNING: Error creating document : SolrInputDocument[..., internal_id=**
>>> 2011042103204394408D878AC717F7****FB21ABF9ECD011CB7ED, ...,
>>> _version_=1426404770097135617, ...]
>>> org.apache.solr.common.****SolrException: version conflict for
>>> 2011042103204394408D878AC717F7****FB21ABF9ECD011CB7ED
>>> expected=1426404770097135617 actual=-1
>>>          at org.apache.solr.update.****processor.**
>>> DistributedUpdateProcessor.****versionAdd(****
>>> DistributedUpdateProcessor.**
>>> java:543)
>>>          at org.apache.solr.update.****processor.**
>>> DistributedUpdateProcessor.****processAdd(****
>>> DistributedUpdateProcessor.**
>>> java:350)
>>>          at org.apache.solr.update.****processor.LogUpdateProcessor.****
>>> processAdd(****LogUpdateProcessorFactory.****java:100)
>>>          at org.apache.solr.handler.****dataimport.SolrWriter.upload(***
>>> *
>>> SolrWriter.java:70)
>>>          at org.apache.solr.handler.****dataimport.DataImportHandler$***
>>> *
>>> 1.upload(DataImportHandler.****java:234)
>>>          at org.apache.solr.handler.****dataimport.DocBuilder.**
>>> buildDocument(DocBuilder.java:****500)
>>>          at org.apache.solr.handler.****dataimport.DocBuilder.**
>>> buildDocument(DocBuilder.java:****404)
>>>          at org.apache.solr.handler.****dataimport.DocBuilder.**
>>> doFullDump(DocBuilder.java:****319)
>>>          at org.apache.solr.handler.****dataimport.DocBuilder.execute(**
>>> **
>>> DocBuilder.java:227)
>>>          at org.apache.solr.handler.****dataimport.DataImporter.**
>>> doFullImport(DataImporter.****java:422)
>>>          at org.apache.solr.handler.****dataimport.DataImporter.**
>>> runCmd(DataImporter.java:487)
>>>          at org.apache.solr.handler.****dataimport.DataImporter$1.run(**
>>> **
>>> DataImporter.java:468)
>>>
>>> (internal_id field is a unique key)
>>>
>>> No documents are therefore being imported. It's interesting to note that
>>> data in the source Solr were imported there by the same data import
>>> configuration from yet another Solr instance - only from an older version
>>> (with wt="xml" of course). That old data did not contain _version_ field
>>> (Solr 1.4.1) so this problem could not appear at all.
>>>
>>> I've tried specifying a few fields in fl parameter of
>>> SolrEntityProcessor,
>>> without including _version_ field - it works. So I guess the last way
>>> would
>>> be to specify all document fields there excluding _version_ - but it's
>>> not
>>> convenient at all.
>>>
>>> Any ideas on what might be a problem here?
>>>
>>> --
>>> Warm regards,
>>> Artem Karpenko
>>>
>>>
>>>
>

Re: Version conflict during data import from another Solr instance into clean Solr

Posted by Artem OXSEED <a....@oxseed.com>.
Hello, thank you for response!

Configuration option <semanticsMode> does not help - it's probably not 
yet implemented. I found however the line of code which checks versions:

Long lastVersion = vinfo.lookupVersion(cmd.getIndexedId());
long foundVersion = lastVersion == null ? -1 : lastVersion;
if ( versionOnUpdate == foundVersion || (versionOnUpdate < 0 && 
foundVersion < 0) || (versionOnUpdate==1 && foundVersion > 0) )
// we're ok if versions match, or if both are negative (all missing docs 
are equal), or if cmd
// specified it must exist (versionOnUpdate==1) and it does.
} else {throw...}

It looks strange to me that if there is no document yet (foundVersion < 
0) then the only case when document will be imported is when input 
version is negative. Guess I need to test specific cases using SolrJ or 
smth. to be sure.

Anyway I'll also check if I can inherit from SolrEntityProcessor and 
override _version_ field there before insertion.

--
Warm regards,
Artem Karpenko

On 13.03.2013 14:18, Alexandre Rafalovitch wrote:
> I believe you are running into the update semantics, new with Solr 4
> (4.1?): https://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics I
> am not sure Wiki is 100% correct (especially on default mode), but it
> should be good enough.
>
> Basically, because you are specifying some real value in _version_ field,
> Solr assumes it is an update operation and expects the already stored field
> to have the same value (to avoid concurrent update issues).
>
> You probably want to temporarily switch to override mode, which is
> 'classic' as described in the link above. Then, after indexing, you can
> reset the configuration.
>
> Regards,
>      Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Wed, Mar 13, 2013 at 6:43 AM, Artem OXSEED <a....@oxseed.com> wrote:
>
>> Hi,
>>
>> I've configured data import handler:
>>
>> <requestHandler name="/dataimport" class="org.apache.solr.**
>> handler.dataimport.**DataImportHandler">
>>      <lst name="defaults">
>>        <str name="config">data-config.xml<**/str>
>>      </lst>
>>    </requestHandler>
>>
>> data-config.xml:
>>
>> <dataConfig>
>>    <document>
>>      <entity name="sep" processor="**SolrEntityProcessor"
>>      url="http://host:8080/index" query="*:*" wt="javabin"/>
>>    </document>
>> </dataConfig>
>>
>> Both Solr instances are of the same version - 4.1. Target Solr instance is
>> empty - no documents exist there.
>>
>> During data import I see constant errors in console:
>>
>> WARNING: Error creating document : SolrInputDocument[..., internal_id=**
>> 2011042103204394408D878AC717F7**FB21ABF9ECD011CB7ED, ...,
>> _version_=1426404770097135617, ...]
>> org.apache.solr.common.**SolrException: version conflict for
>> 2011042103204394408D878AC717F7**FB21ABF9ECD011CB7ED
>> expected=1426404770097135617 actual=-1
>>          at org.apache.solr.update.**processor.**
>> DistributedUpdateProcessor.**versionAdd(**DistributedUpdateProcessor.**
>> java:543)
>>          at org.apache.solr.update.**processor.**
>> DistributedUpdateProcessor.**processAdd(**DistributedUpdateProcessor.**
>> java:350)
>>          at org.apache.solr.update.**processor.LogUpdateProcessor.**
>> processAdd(**LogUpdateProcessorFactory.**java:100)
>>          at org.apache.solr.handler.**dataimport.SolrWriter.upload(**
>> SolrWriter.java:70)
>>          at org.apache.solr.handler.**dataimport.DataImportHandler$**
>> 1.upload(DataImportHandler.**java:234)
>>          at org.apache.solr.handler.**dataimport.DocBuilder.**
>> buildDocument(DocBuilder.java:**500)
>>          at org.apache.solr.handler.**dataimport.DocBuilder.**
>> buildDocument(DocBuilder.java:**404)
>>          at org.apache.solr.handler.**dataimport.DocBuilder.**
>> doFullDump(DocBuilder.java:**319)
>>          at org.apache.solr.handler.**dataimport.DocBuilder.execute(**
>> DocBuilder.java:227)
>>          at org.apache.solr.handler.**dataimport.DataImporter.**
>> doFullImport(DataImporter.**java:422)
>>          at org.apache.solr.handler.**dataimport.DataImporter.**
>> runCmd(DataImporter.java:487)
>>          at org.apache.solr.handler.**dataimport.DataImporter$1.run(**
>> DataImporter.java:468)
>>
>> (internal_id field is a unique key)
>>
>> No documents are therefore being imported. It's interesting to note that
>> data in the source Solr were imported there by the same data import
>> configuration from yet another Solr instance - only from an older version
>> (with wt="xml" of course). That old data did not contain _version_ field
>> (Solr 1.4.1) so this problem could not appear at all.
>>
>> I've tried specifying a few fields in fl parameter of SolrEntityProcessor,
>> without including _version_ field - it works. So I guess the last way would
>> be to specify all document fields there excluding _version_ - but it's not
>> convenient at all.
>>
>> Any ideas on what might be a problem here?
>>
>> --
>> Warm regards,
>> Artem Karpenko
>>
>>


Re: Version conflict during data import from another Solr instance into clean Solr

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I believe you are running into the update semantics, new with Solr 4
(4.1?): https://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics I
am not sure Wiki is 100% correct (especially on default mode), but it
should be good enough.

Basically, because you are specifying some real value in _version_ field,
Solr assumes it is an update operation and expects the already stored field
to have the same value (to avoid concurrent update issues).

You probably want to temporarily switch to override mode, which is
'classic' as described in the link above. Then, after indexing, you can
reset the configuration.

Regards,
    Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Mar 13, 2013 at 6:43 AM, Artem OXSEED <a....@oxseed.com> wrote:

> Hi,
>
> I've configured data import handler:
>
> <requestHandler name="/dataimport" class="org.apache.solr.**
> handler.dataimport.**DataImportHandler">
>     <lst name="defaults">
>       <str name="config">data-config.xml<**/str>
>     </lst>
>   </requestHandler>
>
> data-config.xml:
>
> <dataConfig>
>   <document>
>     <entity name="sep" processor="**SolrEntityProcessor"
>     url="http://host:8080/index" query="*:*" wt="javabin"/>
>   </document>
> </dataConfig>
>
> Both Solr instances are of the same version - 4.1. Target Solr instance is
> empty - no documents exist there.
>
> During data import I see constant errors in console:
>
> WARNING: Error creating document : SolrInputDocument[..., internal_id=**
> 2011042103204394408D878AC717F7**FB21ABF9ECD011CB7ED, ...,
> _version_=1426404770097135617, ...]
> org.apache.solr.common.**SolrException: version conflict for
> 2011042103204394408D878AC717F7**FB21ABF9ECD011CB7ED
> expected=1426404770097135617 actual=-1
>         at org.apache.solr.update.**processor.**
> DistributedUpdateProcessor.**versionAdd(**DistributedUpdateProcessor.**
> java:543)
>         at org.apache.solr.update.**processor.**
> DistributedUpdateProcessor.**processAdd(**DistributedUpdateProcessor.**
> java:350)
>         at org.apache.solr.update.**processor.LogUpdateProcessor.**
> processAdd(**LogUpdateProcessorFactory.**java:100)
>         at org.apache.solr.handler.**dataimport.SolrWriter.upload(**
> SolrWriter.java:70)
>         at org.apache.solr.handler.**dataimport.DataImportHandler$**
> 1.upload(DataImportHandler.**java:234)
>         at org.apache.solr.handler.**dataimport.DocBuilder.**
> buildDocument(DocBuilder.java:**500)
>         at org.apache.solr.handler.**dataimport.DocBuilder.**
> buildDocument(DocBuilder.java:**404)
>         at org.apache.solr.handler.**dataimport.DocBuilder.**
> doFullDump(DocBuilder.java:**319)
>         at org.apache.solr.handler.**dataimport.DocBuilder.execute(**
> DocBuilder.java:227)
>         at org.apache.solr.handler.**dataimport.DataImporter.**
> doFullImport(DataImporter.**java:422)
>         at org.apache.solr.handler.**dataimport.DataImporter.**
> runCmd(DataImporter.java:487)
>         at org.apache.solr.handler.**dataimport.DataImporter$1.run(**
> DataImporter.java:468)
>
> (internal_id field is a unique key)
>
> No documents are therefore being imported. It's interesting to note that
> data in the source Solr were imported there by the same data import
> configuration from yet another Solr instance - only from an older version
> (with wt="xml" of course). That old data did not contain _version_ field
> (Solr 1.4.1) so this problem could not appear at all.
>
> I've tried specifying a few fields in fl parameter of SolrEntityProcessor,
> without including _version_ field - it works. So I guess the last way would
> be to specify all document fields there excluding _version_ - but it's not
> convenient at all.
>
> Any ideas on what might be a problem here?
>
> --
> Warm regards,
> Artem Karpenko
>
>