You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Zimmermann, Thomas" <tz...@techtarget.com> on 2018/08/18 00:15:28 UTC

Copyto with DIH Interpreting string as MultiValued field on copy

Hi,

I’m trying to track down an odd issue I’m seeing when using the SolrEntityProcessor to seed some test data from a solr 4.x cluster to a solr 7.x cluster. It seems like strings are being interpreted as multivalued when passed from a string field to a text field via the copyTo directive. Any clever ideas how to resolve this?

Schema:


Fields and CopyTo


<field name="author" type="string" indexed="true" stored="true" />

<field name="authorText" type="text" indexed="true" stored="true" />

<copyField source="author" dest="authorText" />

Text fieldtype declaration:

<fieldType name="text" class="solr.TextField"

positionIncrementGap="100">

<analyzer type="index">

<charFilter class="solr.HTMLStripCharFilterFactory" />

<tokenizer class="solr.WhitespaceTokenizerFactory" />

<filter class="solr.StopFilterFactory" ignoreCase="true"

words="stopwords.txt" />

<filter class="solr.LowerCaseFilterFactory" />

<filter class="solr.SnowballPorterFilterFactory" language="English"

protected="protwords.txt" />

<filter class="solr.RemoveDuplicatesTokenFilterFactory" />

</analyzer>

<analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory" />

<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"

ignoreCase="true" expand="true" />

<filter class="solr.StopFilterFactory" ignoreCase="true"

words="stopwords.txt" />

<filter class="solr.WordDelimiterGraphFilterFactory"

generateWordParts="1" generateNumberParts="1" catenateWords="0"

catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />

<filter class="solr.LowerCaseFilterFactory" />

<filter class="solr.SnowballPorterFilterFactory" language="English"

protected="protwords.txt" />

<filter class="solr.RemoveDuplicatesTokenFilterFactory" />

</analyzer>

</fieldType>


DIH Config:

<dataConfig>

<document>

    <entity name="sep" processor="SolrEntityProcessor"

            url="http://cluster.solr.eng.techtarget.com/solr/vignette "

            query="*:*"

            fl="*,orig_version_l:_version_">

            <field column="title" name="titleString" />

            <field column="title" name="title"/>

<field column="id"/>

<field column="typedef"/>

<field column="title"/>

<field column="url" />

</entity>

</document>

</dataConfig>

Error:


org.apache.solr.common.SolrException: ERROR: [doc=d751e434c69b6210VgnVCM1000000d01c80aRCRD] Error adding field 'author'='Jeff Hartley' msg=Multiple values encountered for non multiValued copy field authorText: Jeff Hartley

at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:203) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:101) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:980) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:971) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:348) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:284) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:234) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:950) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1168) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:633) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80) ~[?:?]

at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:258) ~[?:?]

at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:527) ~[?:?]

at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) ~[?:?]

at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) ~[?:?]

at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) ~[?:?]

at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) ~[?:?]

at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) ~[?:?]

at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) ~[?:?]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]

Caused by: org.apache.solr.common.SolrException: Multiple values encountered for non multiValued copy field authorText: Jeff Hartley

at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:180) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]

... 22 more


Re: Copyto with DIH Interpreting string as MultiValued field on copy

Posted by "Zimmermann, Thomas" <tz...@techtarget.com>.
Makes total sense. Thanks to both of your for the clarification!

On 8/18/18, 8:03 AM, "Alexandre Rafalovitch" <ar...@gmail.com> wrote:

>Amd part of the issue is that SolrEntityProcessor does not take individual
>field definitions. So that part is ignored and instead just 'fl' mapping
>is
>used as Shawn explained.
>
>So you could also remap authorText in that definition to an ignored field.
>See
>https://github.com/apache/lucene-solr/blob/master/solr/example/example-DIH
>/solr/solr/conf/solr-data-config.xml
>
>Regards,
>    Alex
>
>On Fri, Aug 17, 2018, 11:50 PM Shawn Heisey, <ap...@elyograg.org> wrote:
>
>> On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote:
>> > I¹m trying to track down an odd issue I¹m seeing when using the
>> SolrEntityProcessor to seed some test data from a solr 4.x cluster to a
>> solr 7.x cluster. It seems like strings are being interpreted as
>> multivalued when passed from a string field to a text field via the
>>copyTo
>> directive. Any clever ideas how to resolve this?
>>
>> What's happening is deceptively simple.
>>
>> In the source system, you're copying from author to authorText.  Both
>> fields are stored.  So if you have "Jeff Hartley" in author, you also
>> have "Jeff Hartley" in authorText. So what's happening is that when the
>> destination system imports from the source system, it gets "Jeff
>> Hartley" in both fields, and then copyField says "put a copy of what's
>> in author into authorText" ... and suddenly there are two copies of
>> "Jeff Hartley" in authorText.
>>
>> There are two ways to deal with this:
>>
>> 1) In the query you're doing with SolrEntityProcessor, add an "fl"
>> parameter and list all the fields *except* authorText and any other
>> field where this same problem is happening.
>>
>> 2) Remove the copyField from the schema until after the import from the
>> source server is done.
>>
>> Thanks,
>> Shawn
>>
>>


Re: Copyto with DIH Interpreting string as MultiValued field on copy

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Amd part of the issue is that SolrEntityProcessor does not take individual
field definitions. So that part is ignored and instead just 'fl' mapping is
used as Shawn explained.

So you could also remap authorText in that definition to an ignored field.
See
https://github.com/apache/lucene-solr/blob/master/solr/example/example-DIH/solr/solr/conf/solr-data-config.xml

Regards,
    Alex

On Fri, Aug 17, 2018, 11:50 PM Shawn Heisey, <ap...@elyograg.org> wrote:

> On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote:
> > I’m trying to track down an odd issue I’m seeing when using the
> SolrEntityProcessor to seed some test data from a solr 4.x cluster to a
> solr 7.x cluster. It seems like strings are being interpreted as
> multivalued when passed from a string field to a text field via the copyTo
> directive. Any clever ideas how to resolve this?
>
> What's happening is deceptively simple.
>
> In the source system, you're copying from author to authorText.  Both
> fields are stored.  So if you have "Jeff Hartley" in author, you also
> have "Jeff Hartley" in authorText. So what's happening is that when the
> destination system imports from the source system, it gets "Jeff
> Hartley" in both fields, and then copyField says "put a copy of what's
> in author into authorText" ... and suddenly there are two copies of
> "Jeff Hartley" in authorText.
>
> There are two ways to deal with this:
>
> 1) In the query you're doing with SolrEntityProcessor, add an "fl"
> parameter and list all the fields *except* authorText and any other
> field where this same problem is happening.
>
> 2) Remove the copyField from the schema until after the import from the
> source server is done.
>
> Thanks,
> Shawn
>
>

Re: Copyto with DIH Interpreting string as MultiValued field on copy

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/17/2018 6:15 PM, Zimmermann, Thomas wrote:
> I’m trying to track down an odd issue I’m seeing when using the SolrEntityProcessor to seed some test data from a solr 4.x cluster to a solr 7.x cluster. It seems like strings are being interpreted as multivalued when passed from a string field to a text field via the copyTo directive. Any clever ideas how to resolve this?

What's happening is deceptively simple.

In the source system, you're copying from author to authorText.  Both 
fields are stored.  So if you have "Jeff Hartley" in author, you also 
have "Jeff Hartley" in authorText. So what's happening is that when the 
destination system imports from the source system, it gets "Jeff 
Hartley" in both fields, and then copyField says "put a copy of what's 
in author into authorText" ... and suddenly there are two copies of 
"Jeff Hartley" in authorText.

There are two ways to deal with this:

1) In the query you're doing with SolrEntityProcessor, add an "fl" 
parameter and list all the fields *except* authorText and any other 
field where this same problem is happening.

2) Remove the copyField from the schema until after the import from the 
source server is done.

Thanks,
Shawn