You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chantal Ackermann <ch...@btelligent.de> on 2009/08/03 14:12:56 UTC
Copy Field Question
Dear all,
before searching through the source code - maybe one of you can answer
this easily:
When and based on what are the tokenizer and filters applied when
copying fields? Can it happen that fields are analyzed twice (once when
creating the first field, and a second time when they are copied to the
another field)?
Here an example from my current setup:
I have the following types defined, in schema.xml:
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LengthFilterFactory" min="2" max="5000" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_de.txt" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="German" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_de.txt" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="German" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
Used for those fields:
<field name="title" type="keyword" index="true" stored="true"
required="true" />
<field name="title_de" type="text_de" index="true" stored="false"
required="false" />
<field name="subtitle_text_de" type="text_de" index="true" stored="true"
required="false" />
<field name="dtext_de" type="text_de" index="true" stored="false"
required="false" />
Which are used to populate this field using the copy field directive:
<field name="all_text_de" type="text_de" indexed="true" stored="false"
multiValued="true" />
like that (that is what I do, now, at least):
<copyField source="title" dest="title_de" />
<copyField source="title" dest="all_text_de" />
<copyField source="subtitle_text_de" dest="all_text_de" />
<copyField source="dtext_de" dest="all_text_de" />
I am copying fields with different types to all_text_de, e.g. title is
different from subtitle_text_de. Is the valued copied to the destination
field the raw (input) value or the already analyzed one?
Thanks!
Chantal
--
Chantal Ackermann
Re: Copy Field Question
Posted by Chantal Ackermann <ch...@btelligent.de>.
Thanks, Mark!
Mark Miller schrieb:
> Its the pre-analyzed form thats copied. The field that its copied to will
> determine the analyzer/filters for that field.
> If you want to check out the code doing it, its
> in org.apache.solr.update.DocumentBuilder
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> On Mon, Aug 3, 2009 at 8:12 AM, Chantal Ackermann <
> chantal.ackermann@btelligent.de> wrote:
>
>> Dear all,
>>
>> before searching through the source code - maybe one of you can answer this
>> easily:
>>
>> When and based on what are the tokenizer and filters applied when copying
>> fields? Can it happen that fields are analyzed twice (once when creating the
>> first field, and a second time when they are copied to the another field)?
>>
>>
>> Here an example from my current setup:
>> I have the following types defined, in schema.xml:
>>
>> <fieldType name="text_de" class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <tokenizer class="solr.StandardTokenizerFactory" />
>> <filter class="solr.LengthFilterFactory" min="2" max="5000" />
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_de.txt" />
>> <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
>> <filter class="solr.LowerCaseFilterFactory" />
>> <filter class="solr.SnowballPorterFilterFactory" language="German"
>> />
>> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.StandardTokenizerFactory" />
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_de.txt" />
>> <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
>> <filter class="solr.LowerCaseFilterFactory" />
>> <filter class="solr.SnowballPorterFilterFactory" language="German"
>> />
>> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>> </analyzer>
>> </fieldType>
>>
>> Used for those fields:
>>
>> <field name="title" type="keyword" index="true" stored="true"
>> required="true" />
>> <field name="title_de" type="text_de" index="true" stored="false"
>> required="false" />
>> <field name="subtitle_text_de" type="text_de" index="true" stored="true"
>> required="false" />
>> <field name="dtext_de" type="text_de" index="true" stored="false"
>> required="false" />
>>
>> Which are used to populate this field using the copy field directive:
>>
>> <field name="all_text_de" type="text_de" indexed="true" stored="false"
>> multiValued="true" />
>>
>> like that (that is what I do, now, at least):
>>
>> <copyField source="title" dest="title_de" />
>> <copyField source="title" dest="all_text_de" />
>> <copyField source="subtitle_text_de" dest="all_text_de" />
>> <copyField source="dtext_de" dest="all_text_de" />
>>
>>
>> I am copying fields with different types to all_text_de, e.g. title is
>> different from subtitle_text_de. Is the valued copied to the destination
>> field the raw (input) value or the already analyzed one?
>>
>>
>> Thanks!
>> Chantal
>>
>>
>> --
>> Chantal Ackermann
>>
--
Chantal Ackermann
Consultant
mobil +49 (176) 10 00 09 45
email chantal.ackermann@btelligent.de
--------------------------------------------------------------------------------------------------------
b.telligent GmbH & Co. KG
Lichtenbergstraße 8
D-85748 Garching / München
fon +49 (89) 54 84 25 60
fax +49 (89) 54 84 25 69
web www.btelligent.de
Registered in Munich: HRA 84393
Managing Director: b.telligent Verwaltungs GmbH, HRB 153164 represented
by Sebastian Amtage and Klaus Blaschek
USt.Id.-Nr. DE814054803
Confidentiality Note
This email is intended only for the use of the individual or entity to
which it is addressed, and may contain information that is privileged,
confidential and exempt from disclosure under applicable law. If the
reader of this email message is not the intended recipient, or the
employee or agent responsible for delivery of the message to the
intended recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is prohibited. If you have
received this email in error, please notify us immediately by telephone
at +49 (0) 89 54 84 25 60. Thank you.
Re: Copy Field Question
Posted by Mark Miller <ma...@gmail.com>.
Its the pre-analyzed form thats copied. The field that its copied to will
determine the analyzer/filters for that field.
If you want to check out the code doing it, its
in org.apache.solr.update.DocumentBuilder
--
- Mark
http://www.lucidimagination.com
On Mon, Aug 3, 2009 at 8:12 AM, Chantal Ackermann <
chantal.ackermann@btelligent.de> wrote:
> Dear all,
>
> before searching through the source code - maybe one of you can answer this
> easily:
>
> When and based on what are the tokenizer and filters applied when copying
> fields? Can it happen that fields are analyzed twice (once when creating the
> first field, and a second time when they are copied to the another field)?
>
>
> Here an example from my current setup:
> I have the following types defined, in schema.xml:
>
> <fieldType name="text_de" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory" />
> <filter class="solr.LengthFilterFactory" min="2" max="5000" />
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_de.txt" />
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.SnowballPorterFilterFactory" language="German"
> />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory" />
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_de.txt" />
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.SnowballPorterFilterFactory" language="German"
> />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> </analyzer>
> </fieldType>
>
> Used for those fields:
>
> <field name="title" type="keyword" index="true" stored="true"
> required="true" />
> <field name="title_de" type="text_de" index="true" stored="false"
> required="false" />
> <field name="subtitle_text_de" type="text_de" index="true" stored="true"
> required="false" />
> <field name="dtext_de" type="text_de" index="true" stored="false"
> required="false" />
>
> Which are used to populate this field using the copy field directive:
>
> <field name="all_text_de" type="text_de" indexed="true" stored="false"
> multiValued="true" />
>
> like that (that is what I do, now, at least):
>
> <copyField source="title" dest="title_de" />
> <copyField source="title" dest="all_text_de" />
> <copyField source="subtitle_text_de" dest="all_text_de" />
> <copyField source="dtext_de" dest="all_text_de" />
>
>
> I am copying fields with different types to all_text_de, e.g. title is
> different from subtitle_text_de. Is the valued copied to the destination
> field the raw (input) value or the already analyzed one?
>
>
> Thanks!
> Chantal
>
>
> --
> Chantal Ackermann
>