You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ufuk yılmaz <uy...@vivaldi.net.INVALID> on 2020/12/11 21:38:27 UTC

Copyfields, will there be any difference between source and dest if they are switched?

Hello all,

Documentation states “Fields are copied before analysis is done, meaning you can have two fields with identical original content, but which use different analysis chains and are stored in the index differently.”

I have a field definition for a case insensitive string which I use for querying:

    <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer type="query">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="strings_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true" multiValued="true">
      <analyzer type="query">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

And a regular string without any analyzers:
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/>

And I have 2 fields, one for searching and one for faceting:

<field name="place.name_orig" type="string"  indexed="false" stored="false" docValues="true"/>
<field name="place.name" type="string_ci" indexed="true" stored="true" docValues="false"/>

New documents arrive at Solr with a place.name field, so I’m using a copyField to copy value to the string:

<copyField source="place.name" dest="place.name_orig" maxChars="1024"/>

My question is, will there be any difference on the resulting indexed documents if I switched source and dest fields in copyField directive? My understanding is copyField operates on raw data arriving at Solr as is, and field declarations themselves decide what to do with it, so there shouldn’t be any difference, but I’m currently investigating an issue which,

- Same data is indexed in two different collections, one uses a copyField directive like above
- Other one don’t use copyField, but same value is sent both in place.name and place.name_orig fields during indexing
But I’m seeing some slight differences in resulting documents, mainly in casing between i and İ.

Have a nice weekend

Sent from Mail for Windows 10


Re: Copyfields, will there be any difference between source and dest if they are switched?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/11/2020 2:38 PM, ufuk yılmaz wrote:
> <copyField source="place.name" dest="place.name_orig" maxChars="1024"/>
> 
> My question is, will there be any difference on the resulting indexed documents if I switched source and dest fields in copyField directive? My understanding is copyField operates on raw data arriving at Solr as is, and field declarations themselves decide what to do with it, so there shouldn’t be any difference, but I’m currently investigating an issue which,

Presumably your indexing includes place.name but does not contain 
place.name_orig in the fields that are sent to Solr for indexing.  If 
that's the case, then reversing the fields in the copyField will leave 
place.name_orig empty.

If the indexed data does contain both fields, then the target field 
would contain the data twice, and if the target field is not 
multiValued, then indexing will fail.

> - Same data is indexed in two different collections, one uses a copyField directive like above
> - Other one don’t use copyField, but same value is sent both in place.name and place.name_orig fields during indexing
> But I’m seeing some slight differences in resulting documents, mainly in casing between i and İ.

Analysis does not affect document data in the results.  The data you see 
in results will be exactly what was originally sent.  The only way Solr 
can change stored data is through the use of Update Processors defined 
in solrconfig.xml ... analysis defined in the schema will not affect 
document data in search results.

Thanks,
Shawn