You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christopher Gross <co...@gmail.com> on 2010/11/09 16:39:57 UTC

dynamically create unique key

I'm trying to use Solr to store information from a few different sources in
one large index.  I need to create a unique key for the Solr index that will
be unique per document.  If I have 3 systems, and they all have a document
with id=1, then I need to create a "uniqueId" field in my schema that
contains both the system name and that id, along the lines of: "sysa1",
"sysb1", and "sysc1".  That way, each document will have a unique id.

I added this to my schema.xml:

  <copyField source="source" dest="uniqueId"/>
  <copyField source="id" dest="uniqueId"/>


However, after trying to insert, I got this:
java.lang.Exception: ERROR: multiple values encountered for non multiValued
copy field uniqueId: sysa

So instead of just appending to the uniqueId field, it tried to do a
multiValued.  Does anyone have an idea on how I can make this work?

Thanks!

-- Chris

Re: dynamically create unique key

Posted by Lance Norskog <go...@gmail.com>.
Here is an exausting and exhaustive discursion about picking a unique key:

http://wiki.apache.org/solr/UniqueKey




On Tue, Nov 9, 2010 at 4:20 PM, Dennis Gearon <ge...@sbcglobal.net> wrote:
> Seems to me, it would be a good idea to put into the Solr Code, a unique ID per
> instance or installation or both, accessible either with JAVA or a query. Kind
> of like all the browsers do for their SSL connections.
>
> Then, it's automatically easy to implement what is described below.
>
> Maybe it should be written to the config file upon first run when it does not
> exist, and then any updates or reinstalls would reuse the same
> installation/instance ID.
>
>
>
> From: Christopher Gross <co...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, November 9, 2010 11:37:03 AM
> Subject: Re: dynamically create unique key
>
> Thanks Hoss, I'll look into that!
>
> -- Chris
>
>
> On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter <ho...@fucit.org>wrote:
>
>>
>> : one large index.  I need to create a unique key for the Solr index that
>> will
>> : be unique per document.  If I have 3 systems, and they all have a
>> document
>> : with id=1, then I need to create a "uniqueId" field in my schema that
>> : contains both the system name and that id, along the lines of: "sysa1",
>> : "sysb1", and "sysc1".  That way, each document will have a unique id.
>>
>> take a look at the SignatureUpdateProcessorFactory...
>>
>> http://wiki.apache.org/solr/Deduplication
>>
>> :   <copyField source="source" dest="uniqueId"/>
>> :   <copyField source="id" dest="uniqueId"/>
>>         ...
>> : So instead of just appending to the uniqueId field, it tried to do a
>> : multiValued.  Does anyone have an idea on how I can make this work?
>>
>> copyField doesn't "append" it copies Field (value) instances from the
>> "source" field to the "dest" field -- so if you get multiple values for
>> hte dest field.
>>
>>
>> -Hoss
>>
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: dynamically create unique key

Posted by Dennis Gearon <ge...@sbcglobal.net>.
Seems to me, it would be a good idea to put into the Solr Code, a unique ID per 
instance or installation or both, accessible either with JAVA or a query. Kind 
of like all the browsers do for their SSL connections.

Then, it's automatically easy to implement what is described below.

Maybe it should be written to the config file upon first run when it does not 
exist, and then any updates or reinstalls would reuse the same 
installation/instance ID.



From: Christopher Gross <co...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 11:37:03 AM
Subject: Re: dynamically create unique key

Thanks Hoss, I'll look into that!

-- Chris


On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : one large index.  I need to create a unique key for the Solr index that
> will
> : be unique per document.  If I have 3 systems, and they all have a
> document
> : with id=1, then I need to create a "uniqueId" field in my schema that
> : contains both the system name and that id, along the lines of: "sysa1",
> : "sysb1", and "sysc1".  That way, each document will have a unique id.
>
> take a look at the SignatureUpdateProcessorFactory...
>
> http://wiki.apache.org/solr/Deduplication
>
> :   <copyField source="source" dest="uniqueId"/>
> :   <copyField source="id" dest="uniqueId"/>
>         ...
> : So instead of just appending to the uniqueId field, it tried to do a
> : multiValued.  Does anyone have an idea on how I can make this work?
>
> copyField doesn't "append" it copies Field (value) instances from the
> "source" field to the "dest" field -- so if you get multiple values for
> hte dest field.
>
>
> -Hoss
>


Re: dynamically create unique key

Posted by Christopher Gross <co...@gmail.com>.
Thanks Hoss, I'll look into that!

-- Chris


On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : one large index.  I need to create a unique key for the Solr index that
> will
> : be unique per document.  If I have 3 systems, and they all have a
> document
> : with id=1, then I need to create a "uniqueId" field in my schema that
> : contains both the system name and that id, along the lines of: "sysa1",
> : "sysb1", and "sysc1".  That way, each document will have a unique id.
>
> take a look at the SignatureUpdateProcessorFactory...
>
> http://wiki.apache.org/solr/Deduplication
>
> :   <copyField source="source" dest="uniqueId"/>
> :   <copyField source="id" dest="uniqueId"/>
>         ...
> : So instead of just appending to the uniqueId field, it tried to do a
> : multiValued.  Does anyone have an idea on how I can make this work?
>
> copyField doesn't "append" it copies Field (value) instances from the
> "source" field to the "dest" field -- so if you get multiple values for
> hte dest field.
>
>
> -Hoss
>

Re: dynamically create unique key

Posted by Chris Hostetter <ho...@fucit.org>.
: one large index.  I need to create a unique key for the Solr index that will
: be unique per document.  If I have 3 systems, and they all have a document
: with id=1, then I need to create a "uniqueId" field in my schema that
: contains both the system name and that id, along the lines of: "sysa1",
: "sysb1", and "sysc1".  That way, each document will have a unique id.

take a look at the SignatureUpdateProcessorFactory...

http://wiki.apache.org/solr/Deduplication

:   <copyField source="source" dest="uniqueId"/>
:   <copyField source="id" dest="uniqueId"/>
	...
: So instead of just appending to the uniqueId field, it tried to do a
: multiValued.  Does anyone have an idea on how I can make this work?

copyField doesn't "append" it copies Field (value) instances from the 
"source" field to the "dest" field -- so if you get multiple values for 
hte dest field. 


-Hoss

Re: dynamically create unique key

Posted by solr_noob <di...@gmail.com>.
Hello Christopher
I ran into the same problem. When I disable dedupe from the update handler,
things worked fine. The problem is when i enable dedupe that I run into the
multivalued error. I'm also using SolJ to add documents.

Were you able to resolve this?

If so, would you kindly post your solution?

thanks for your input/help



--
View this message in context: http://lucene.472066.n3.nabble.com/dynamically-create-unique-key-tp1869924p3951857.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dynamically create unique key

Posted by Ken Stanley <do...@gmail.com>.
On Tue, Nov 9, 2010 at 10:53 AM, Christopher Gross <co...@gmail.com> wrote:
> Thanks Ken.
>
> I'm using a script with Java/SolrJ to copy documents from their original
> locations into the Solr Index.
>
> I wasn't sure if the copyField would help me, but from your answers it seems
> that I'll have to handle it on my own.  That's fine -- it is definitely not
> hard to pass a new field myself.  I was just thinking that there should be
> an "easy" way to have Solr build the unique field, since it was getting
> everything anyway.
>
> I was just confused as to why I was getting a multiValued error, since I was
> just trying to append to a field.  I wasn't sure if I was missing something.
>
> Thanks again!
>
> -- Chris
>

Chris,

I definitely understand your sentiment. The thing to keep in mind with
SOLR is that it really has limited logic mechanisms; in fact, unless
you're willing to use the DataImportHandler (dih) and the
ScriptTransformer, you really have no logic.

The copyField directive in schema.xml is mainly used to help you
easily copy the contents of one field into another so that it may be
indexed in multiple ways; for example, you can index a string so that
it is stored literally (i.e., "Hello World"), parsed using a
whitespace tokenizer (i.e., "Hello", "World"), parsed for an nGram
tokenizer (i.e., "H", "He", "Hel"... ). This is beneficial to you
because you wouldn't have to explicitly define each possible instance
in your data stream. You just define the field once, and SOLR is smart
enough to copy it where it needs to go.

Glad to have helped. :)

- Ken

Re: dynamically create unique key

Posted by Christopher Gross <co...@gmail.com>.
Thanks Ken.

I'm using a script with Java/SolrJ to copy documents from their original
locations into the Solr Index.

I wasn't sure if the copyField would help me, but from your answers it seems
that I'll have to handle it on my own.  That's fine -- it is definitely not
hard to pass a new field myself.  I was just thinking that there should be
an "easy" way to have Solr build the unique field, since it was getting
everything anyway.

I was just confused as to why I was getting a multiValued error, since I was
just trying to append to a field.  I wasn't sure if I was missing something.

Thanks again!

-- Chris


On Tue, Nov 9, 2010 at 10:47 AM, Ken Stanley <do...@gmail.com> wrote:

> On Tue, Nov 9, 2010 at 10:39 AM, Christopher Gross <co...@gmail.com>
> wrote:
> > I'm trying to use Solr to store information from a few different sources
> in
> > one large index.  I need to create a unique key for the Solr index that
> will
> > be unique per document.  If I have 3 systems, and they all have a
> document
> > with id=1, then I need to create a "uniqueId" field in my schema that
> > contains both the system name and that id, along the lines of: "sysa1",
> > "sysb1", and "sysc1".  That way, each document will have a unique id.
> >
> > I added this to my schema.xml:
> >
> >  <copyField source="source" dest="uniqueId"/>
> >  <copyField source="id" dest="uniqueId"/>
> >
> >
> > However, after trying to insert, I got this:
> > java.lang.Exception: ERROR: multiple values encountered for non
> multiValued
> > copy field uniqueId: sysa
> >
> > So instead of just appending to the uniqueId field, it tried to do a
> > multiValued.  Does anyone have an idea on how I can make this work?
> >
> > Thanks!
> >
> > -- Chris
> >
>
> Chris,
>
> Depending on how you insert your documents into SOLR will determine
> how to create your unique field. If you are POST'ing the data via
> HTTP, then you would be responsible for building your unique id (i.e.,
> your program/language would use string concatenation to add the unique
> id to the output before it gets to the update handler in SOLR). If
> you're using the DataImportHandler, then you can use the
> TemplateTransformer
> (http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer) to
> dynamically build your unique id at document insertion time.
>
> For example, we here at bizjournals use SOLR and the DataImportHandler
> to index our documents. Like you, we run the risk of two or more ids
> clashing, and thus overwriting a different type of document. As such,
> we take two or three different fields and combine them together using
> the TemplateTransformer to generate a more unique id for each document
> we index.
>
> With respect to the multiValued option, that is used more for an
> array-like structure within a field. For example, if you have a blog
> entry with multiple tag keywords, you would probably want a field in
> SOLR that can contain the various tag keywords for each blog entry;
> this is where multiValued comes in handy.
>
> I hope that this helps to clarify things for you.
>
> - Ken Stanley
>

Re: dynamically create unique key

Posted by Ken Stanley <do...@gmail.com>.
On Tue, Nov 9, 2010 at 10:39 AM, Christopher Gross <co...@gmail.com> wrote:
> I'm trying to use Solr to store information from a few different sources in
> one large index.  I need to create a unique key for the Solr index that will
> be unique per document.  If I have 3 systems, and they all have a document
> with id=1, then I need to create a "uniqueId" field in my schema that
> contains both the system name and that id, along the lines of: "sysa1",
> "sysb1", and "sysc1".  That way, each document will have a unique id.
>
> I added this to my schema.xml:
>
>  <copyField source="source" dest="uniqueId"/>
>  <copyField source="id" dest="uniqueId"/>
>
>
> However, after trying to insert, I got this:
> java.lang.Exception: ERROR: multiple values encountered for non multiValued
> copy field uniqueId: sysa
>
> So instead of just appending to the uniqueId field, it tried to do a
> multiValued.  Does anyone have an idea on how I can make this work?
>
> Thanks!
>
> -- Chris
>

Chris,

Depending on how you insert your documents into SOLR will determine
how to create your unique field. If you are POST'ing the data via
HTTP, then you would be responsible for building your unique id (i.e.,
your program/language would use string concatenation to add the unique
id to the output before it gets to the update handler in SOLR). If
you're using the DataImportHandler, then you can use the
TemplateTransformer
(http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer) to
dynamically build your unique id at document insertion time.

For example, we here at bizjournals use SOLR and the DataImportHandler
to index our documents. Like you, we run the risk of two or more ids
clashing, and thus overwriting a different type of document. As such,
we take two or three different fields and combine them together using
the TemplateTransformer to generate a more unique id for each document
we index.

With respect to the multiValued option, that is used more for an
array-like structure within a field. For example, if you have a blog
entry with multiple tag keywords, you would probably want a field in
SOLR that can contain the various tag keywords for each blog entry;
this is where multiValued comes in handy.

I hope that this helps to clarify things for you.

- Ken Stanley