You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Thorhauer <jt...@yakabod.com> on 2014/04/25 13:10:08 UTC

dynamic field assignments

I have a scenario where I would like dynamically assign incoming
document fields to two different solr schema fieldTypes.  One
fieldType will be an exact match fieldType while the other will be a
full text fieldType.  I know that I can use the dynamicField to assign
fields using the asterisk in a naming pattern.  However, I have a set
of incoming data in which the field names can change at run time.  The
fields will follow a predictable pattern but the pattern can not be
recognized using the dynamicField type.

So here is an example of the new types of field names that I need to
be able to process:

FOO_BAR_TEXT_1
FOO_BAR_TEXT_2
FOO_BAR_TEXT_3
FOO_BAR_TEXT_4

FOO_BAR_SELECT_1
FOO_BAR_SELECT_2
FOO_BAR_SELECT_3

So the above fields will not be defined in advance.  I need to map all
fields with the name FOO_BAR_SELECT_* to a fieldType of 'exactMatch'
and I need to map all of the fields with name FOO_BAR_TEXT_* to a
fieldType of full'text'.  I was hoping there might be a way of doing
this dynamically.  Does anyone have any ideas how to approach this?

Thanks,
John Thorhauer

Re: dynamic field assignments

Posted by Jack Krupansky <ja...@basetechnology.com>.
Solr only supports mapping of values to field names, not mapping to field 
types. Field names are then mapped to field types.

DynamicField only supports prefix OR suffix wildcard, not both in the same 
pattern.

In the future, please take care to design your data model with the features 
and limitations of Solr in mind, then you won't find yourself boxed into a 
corner like this.

In short, it sounds like you need to go back to the drawing board and 
redesign your data model.

Also, take a look at the schemaless and dynamic schema modes that have 
recently been added to Solr - these provide the tools to dynamically add 
fields to a schema.

See:
https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode

Dynamic field are a very powerful feature of Solr, but please don't treat 
them as a panacea for weak data modeling. Use them only in moderation.

-- Jack Krupansky

-----Original Message----- 
From: John Thorhauer
Sent: Friday, April 25, 2014 7:49 AM
To: solr-user@lucene.apache.org
Subject: Re: dynamic field assignments

Jack,

Thanks for your help.

> Reading your last paragraph, how is that any different than exactly what
> DynamicField actually does?

My understanding is that DynamicField can do something like
FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have
FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2.  Both of those
field names need to map to a field type of 'fullText'.

> You say you want to change fields at "run time" - what is "run time"? When
> exactly do your field names change?

What I mean is that when the document is fed to solr.  So on the
update process when the document is being indexed.  The document that
is being indexed may have fields that are unknown until the time of
indexing.  However, some of those fields will follow a predictable
naming patter as mentioned above.

> You can always write an update request processor to do any manipulation of
> field values at index time.

I see that Solr has this capability.  However, I dont think I need to
manipulate field values.  I need to map a field/value to a particular
fieldType for indexing. 


Re: dynamic field assignments

Posted by John Thorhauer <jt...@yakabod.com>.
Chris,

Thanks so much for the suggestion.  I will look into this approach.  It
looks very promising!

John


On Mon, May 5, 2014 at 9:50 PM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : My understanding is that DynamicField can do something like
> : FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have
> : FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2.  Both of those
> : field names need to map to a field type of 'fullText'.
>
> I'm pretty sure you can get what you are after with the new Manged Schema
> functionality...
>
> https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
>
> https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig
>
> Assuming you have managed schema enabled in solrconfig.xml, and you define
> both of your fieldTypes using names like "text" and "select" then
> something like this should work in your processor chain...
>
>  <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
>    <str name="fieldRegex">.*_TEXT_.*</str>
>    <str name="defaultFieldType">text</str>
>  </processor>
>  <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
>    <str name="fieldRegex">.*_SELECT_.*</str>
>    <str name="defaultFieldType">select</str>
>  </processor>
>
>
> (Normally that processor is used once with multiple value->type mappings
> -- but in your case you don't care about the run-time value, just the run
> time field name regex (which should also be configurable according
> to the various FieldNameSelector rules...
>
>
> https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
>
> https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html
>
>
> -Hoss
> http://www.lucidworks.com/
>



-- 
John Thorhauer
Director/Remote Interfaces
Yakabod, Inc.
301-662-4554 x2105

Re: dynamic field assignments

Posted by Chris Hostetter <ho...@fucit.org>.
: My understanding is that DynamicField can do something like
: FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have
: FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2.  Both of those
: field names need to map to a field type of 'fullText'.

I'm pretty sure you can get what you are after with the new Manged Schema 
functionality...

https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig

Assuming you have managed schema enabled in solrconfig.xml, and you define 
both of your fieldTypes using names like "text" and "select" then 
something like this should work in your processor chain... 

 <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
   <str name="fieldRegex">.*_TEXT_.*</str>
   <str name="defaultFieldType">text</str>
 </processor>
 <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
   <str name="fieldRegex">.*_SELECT_.*</str>
   <str name="defaultFieldType">select</str>
 </processor>


(Normally that processor is used once with multiple value->type mappings 
-- but in your case you don't care about the run-time value, just the run 
time field name regex (which should also be configurable according 
to the various FieldNameSelector rules...

https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html


-Hoss
http://www.lucidworks.com/

Re: dynamic field assignments

Posted by John Thorhauer <jt...@yakabod.com>.
Jack,

Thanks for your help.

> Reading your last paragraph, how is that any different than exactly what
> DynamicField actually does?

My understanding is that DynamicField can do something like
FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have
FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2.  Both of those
field names need to map to a field type of 'fullText'.

> You say you want to change fields at "run time" - what is "run time"? When
> exactly do your field names change?

What I mean is that when the document is fed to solr.  So on the
update process when the document is being indexed.  The document that
is being indexed may have fields that are unknown until the time of
indexing.  However, some of those fields will follow a predictable
naming patter as mentioned above.

> You can always write an update request processor to do any manipulation of
> field values at index time.

I see that Solr has this capability.  However, I dont think I need to
manipulate field values.  I need to map a field/value to a particular
fieldType for indexing.

Re: dynamic field assignments

Posted by Jack Krupansky <ja...@basetechnology.com>.
Reading your last paragraph, how is that any different than exactly what 
DynamicField actually does?

You say you want to change fields at "run time" - what is "run time"? When 
exactly do your field names change? To be clear, field names do not change 
in Solr once the data is written to the index.

You can always write an update request processor to do any manipulation of 
field values at index time.

Have you had your data model reviewed by a professional Solr consultant to 
verify that it is indeed a reasonable approach? We can answer direct 
questions on this list, but it is not a substitute for professional review.

-- Jack Krupansky

-----Original Message----- 
From: John Thorhauer
Sent: Friday, April 25, 2014 7:10 AM
To: solr-user@lucene.apache.org
Subject: dynamic field assignments

I have a scenario where I would like dynamically assign incoming
document fields to two different solr schema fieldTypes.  One
fieldType will be an exact match fieldType while the other will be a
full text fieldType.  I know that I can use the dynamicField to assign
fields using the asterisk in a naming pattern.  However, I have a set
of incoming data in which the field names can change at run time.  The
fields will follow a predictable pattern but the pattern can not be
recognized using the dynamicField type.

So here is an example of the new types of field names that I need to
be able to process:

FOO_BAR_TEXT_1
FOO_BAR_TEXT_2
FOO_BAR_TEXT_3
FOO_BAR_TEXT_4

FOO_BAR_SELECT_1
FOO_BAR_SELECT_2
FOO_BAR_SELECT_3

So the above fields will not be defined in advance.  I need to map all
fields with the name FOO_BAR_SELECT_* to a fieldType of 'exactMatch'
and I need to map all of the fields with name FOO_BAR_TEXT_* to a
fieldType of full'text'.  I was hoping there might be a way of doing
this dynamically.  Does anyone have any ideas how to approach this?

Thanks,
John Thorhauer