You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "S.L" <si...@gmail.com> on 2014/11/12 01:18:54 UTC

Different ids for the same document in different replicas.

Hi All,

I am seeing interesting behavior on the replicas , I have a single
shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
number of documents ~375 that are replicated across the six replicas .

The interesting thing is that the same  document has a different id in
each one of those replicas .

This is causing the fq(id:xyz) type queries to fail, depending on
which replica the query goes to.

I have  specified the id field in the following manner in schema.xml,
is it the right way to specifiy an auto generated id in  SolrCloud ?

        <field name="id" type="uuid" indexed="true" stored="true"
            required="true" multiValued="false" />


Thanks.

Re: Different ids for the same document in different replicas.

Posted by Erick Erickson <er...@gmail.com>.
bq:  can this be used as an unique value instead of generating the
hashcode for the urlField

Don't do this. The _version_ field is used internally for optimistic
locking etc. I'd be _very_
cautious about co-opting this for anything else.

Best,
Erick

On Thu, Nov 13, 2014 at 8:14 AM, Meraj A. Khan <me...@gmail.com> wrote:
> Thanks , I also noticed that the mandatory _version_ field is also
> uniquely generated for every document in the collection , can this be
> used as an unique value instead of generating the hashcode for the
> urlField.
>
> I want to avoid creation of a custom unique filed if _version_ field
> which is mandated for schema.xml actually does that for me.
>
>
>
> On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm
> <Ga...@averyranchconsulting.com> wrote:
>> OK.  So it sounds like doctorURL is a good key, but you don’t like the special characters.  I’ve used MD5 hashes of URLs before as a way to convert unique URLs into unique alphanumeric strings in a repeatable way.  I think most programming languages contain libraries for doing that as you feed the data to Solr (Java certainly does).  Other hashing or encoding mechanisms could be used if you wanted to be able to programmatically convert from the doctorURL to the string you want to use and back again.
>>
>> Anyway, the point there being that you have a repeatable unique key that is derived directly from the data you’re storing.  Not a random ID value that will be different every time you feed the same thing in.
>>
>> BTW, you can certainly use a custom field type to do the hashing work, but I’d suggest you do that before feeding the data to SolrCloud.  If you do it outside of SolrCloud, then SolrCloud can use it for routing to the correct shard.  If you try to do it solely in a field type, the field type output won’t be available until the indexing is actually occurring, which is too late for routing purposes.  And that means you can’t ensure that subsequent re-feeds of the same thing will overwrite the old values since you can’t make sure they get routed to the same shard.
>>
>>> On Nov 12, 2014, at 7:50 PM, Meraj A. Khan <me...@gmail.com> wrote:
>>>
>>> Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
>>> mechanism because urls can have special characters that can caise issue
>>> with Solr lookup.
>>>
>>> I guess I should rephrase my question to ,how to auto generate the unique
>>> keys in the id field when using SolrCloud?
>>> On Nov 12, 2014 7:28 PM, "Garth Grimm" <Ga...@averyranchconsulting.com>
>>> wrote:
>>>
>>>> You mention you already have a unique Key identified for the data you’re
>>>> storing in Solr:
>>>>
>>>>> <uniqueKey>doctorId<uniquekey>
>>>>
>>>> If that’s the field you’re using to uniquely identify each thing you’re
>>>> storing in the solr index, why do you want to have an id field that is
>>>> populated with some random value?  You’ll be using the doctorId field as
>>>> the key, and the id field will have no real meaning in your Data Model.
>>>>
>>>> If doctorId actually isn’t unique to each item you plan on storing in
>>>> Solr, is there any other field that is?  If so, use that field as your
>>>> unique key.
>>>>
>>>> Remember, this uniqueKeys are usually used for routing documents to shards
>>>> in SolrCloud, and are used to ensure that later updates of the same “thing”
>>>> overwrite the old one, rather than generating multiple copies.  So the keys
>>>> really should be something derived from the data your storing.  I’m not
>>>> sure if I understand why you would want to have the key randomly generated.
>>>>
>>>>> On Nov 12, 2014, at 6:39 PM, S.L <si...@gmail.com> wrote:
>>>>>
>>>>> Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
>>>>> "string" only blank ids are being generated ,looks like the id is being
>>>>> auto generated only if the the id is set to  type uuid , but in case of
>>>>> SolrCloud this id will be unique per replica.
>>>>>
>>>>> Is there a  way to generate a unique id both in case of SolrCloud with
>>>> out
>>>>> using the uuid type or not having a per replica unique id?
>>>>>
>>>>> The uuid in question is of type .
>>>>>
>>>>> <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
>>>>>
>>>>>
>>>>> On Wed, Nov 12, 2014 at 6:20 PM, S.L <si...@gmail.com> wrote:
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
>>>>>> defined in my schema.xml.
>>>>>>
>>>>>> If along with that I also want the <id></id> field to be automatically
>>>>>> generated for each document do I have to declare it as a <uniquekey> as
>>>>>> well , because I just tried the following setting without the uniqueKey
>>>> for
>>>>>> id and its only generating blank ids for me.
>>>>>>
>>>>>> *schema.xml*
>>>>>>
>>>>>>       <field name="id" type="string" indexed="true" stored="true"
>>>>>>           required="true" multiValued="false" />
>>>>>>
>>>>>> *solrconfig.xml*
>>>>>>
>>>>>>     <updateRequestProcessorChain name="uuid">
>>>>>>
>>>>>>       <processor class="solr.UUIDUpdateProcessorFactory">
>>>>>>           <str name="fieldName">id</str>
>>>>>>       </processor>
>>>>>>       <processor class="solr.RunUpdateProcessorFactory" />
>>>>>>   </updateRequestProcessorChain>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
>>>>>> GarthGrimm@averyranchconsulting.com> wrote:
>>>>>>
>>>>>>> Looking a little deeper, I did find this about UUIDField
>>>>>>>
>>>>>>>
>>>>>>>
>>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>>>>>>
>>>>>>> "NOTE: Configuring a UUIDField instance with a default value of "NEW"
>>>> is
>>>>>>> not advisable for most users when using SolrCloud (and not possible if
>>>> the
>>>>>>> UUID value is configured as the unique key field) since the result
>>>> will be
>>>>>>> that each replica of each document will get a unique UUID value. Using
>>>>>>> UUIDUpdateProcessorFactory<
>>>>>>>
>>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
>>>>>
>>>>>>> to generate UUID values when documents are added is recomended
>>>> instead.”
>>>>>>>
>>>>>>> That might describe the behavior you saw.  And the use of
>>>>>>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
>>>> well
>>>>>>> here:
>>>>>>>
>>>>>>>
>>>>>>>
>>>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>>>>>>
>>>>>>> Though I’ve not actually tried that process before.
>>>>>>>
>>>>>>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>>>>>>> GarthGrimm@averyranchconsulting.com<mailto:
>>>>>>> GarthGrimm@averyranchconsulting.com>> wrote:
>>>>>>>
>>>>>>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>>>>>>
>>>>>>> Generally, I’d stick with the out of the box advice of the schema.xml
>>>>>>> file, which includes things like….
>>>>>>>
>>>>>>> <!-- Only remove the "id" field if you have a very good reason to.
>>>>>>> While not strictly
>>>>>>>   required, it is highly recommended. A <uniqueKey> is present in
>>>>>>> almost all Solr
>>>>>>>   installations. See the <uniqueKey> declaration below where
>>>>>>> <uniqueKey> is set to "id".
>>>>>>> -->
>>>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>>>> required="true" multiValued="false" />
>>>>>>>
>>>>>>> and…
>>>>>>>
>>>>>>> <!-- Field to use to determine and enforce document uniqueness.
>>>>>>>    Unless this field is marked with required="false", it will be a
>>>>>>> required field
>>>>>>> -->
>>>>>>> <uniqueKey>id</uniqueKey>
>>>>>>>
>>>>>>> If you’re creating some key/value pair with uuid as the key as you feed
>>>>>>> documents in, and you know that the uuid values you’re creating are
>>>> unique,
>>>>>>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
>>>>>>> change the key name you send in from ‘uuid’ to ‘id’.
>>>>>>>
>>>>>>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving016@gmail.com<mailto:
>>>>>>> simpleliving016@gmail.com>> wrote:
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I am seeing interesting behavior on the replicas , I have a single
>>>>>>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>>>>>>> number of documents ~375 that are replicated across the six replicas .
>>>>>>>
>>>>>>> The interesting thing is that the same  document has a different id in
>>>>>>> each one of those replicas .
>>>>>>>
>>>>>>> This is causing the fq(id:xyz) type queries to fail, depending on
>>>>>>> which replica the query goes to.
>>>>>>>
>>>>>>> I have  specified the id field in the following manner in schema.xml,
>>>>>>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>>>>>>
>>>>>>>     <field name="id" type="uuid" indexed="true" stored="true"
>>>>>>>         required="true" multiValued="false" />
>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>

Re: Different ids for the same document in different replicas.

Posted by "Meraj A. Khan" <me...@gmail.com>.
Thanks , I also noticed that the mandatory _version_ field is also
uniquely generated for every document in the collection , can this be
used as an unique value instead of generating the hashcode for the
urlField.

I want to avoid creation of a custom unique filed if _version_ field
which is mandated for schema.xml actually does that for me.



On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm
<Ga...@averyranchconsulting.com> wrote:
> OK.  So it sounds like doctorURL is a good key, but you don’t like the special characters.  I’ve used MD5 hashes of URLs before as a way to convert unique URLs into unique alphanumeric strings in a repeatable way.  I think most programming languages contain libraries for doing that as you feed the data to Solr (Java certainly does).  Other hashing or encoding mechanisms could be used if you wanted to be able to programmatically convert from the doctorURL to the string you want to use and back again.
>
> Anyway, the point there being that you have a repeatable unique key that is derived directly from the data you’re storing.  Not a random ID value that will be different every time you feed the same thing in.
>
> BTW, you can certainly use a custom field type to do the hashing work, but I’d suggest you do that before feeding the data to SolrCloud.  If you do it outside of SolrCloud, then SolrCloud can use it for routing to the correct shard.  If you try to do it solely in a field type, the field type output won’t be available until the indexing is actually occurring, which is too late for routing purposes.  And that means you can’t ensure that subsequent re-feeds of the same thing will overwrite the old values since you can’t make sure they get routed to the same shard.
>
>> On Nov 12, 2014, at 7:50 PM, Meraj A. Khan <me...@gmail.com> wrote:
>>
>> Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
>> mechanism because urls can have special characters that can caise issue
>> with Solr lookup.
>>
>> I guess I should rephrase my question to ,how to auto generate the unique
>> keys in the id field when using SolrCloud?
>> On Nov 12, 2014 7:28 PM, "Garth Grimm" <Ga...@averyranchconsulting.com>
>> wrote:
>>
>>> You mention you already have a unique Key identified for the data you’re
>>> storing in Solr:
>>>
>>>> <uniqueKey>doctorId<uniquekey>
>>>
>>> If that’s the field you’re using to uniquely identify each thing you’re
>>> storing in the solr index, why do you want to have an id field that is
>>> populated with some random value?  You’ll be using the doctorId field as
>>> the key, and the id field will have no real meaning in your Data Model.
>>>
>>> If doctorId actually isn’t unique to each item you plan on storing in
>>> Solr, is there any other field that is?  If so, use that field as your
>>> unique key.
>>>
>>> Remember, this uniqueKeys are usually used for routing documents to shards
>>> in SolrCloud, and are used to ensure that later updates of the same “thing”
>>> overwrite the old one, rather than generating multiple copies.  So the keys
>>> really should be something derived from the data your storing.  I’m not
>>> sure if I understand why you would want to have the key randomly generated.
>>>
>>>> On Nov 12, 2014, at 6:39 PM, S.L <si...@gmail.com> wrote:
>>>>
>>>> Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
>>>> "string" only blank ids are being generated ,looks like the id is being
>>>> auto generated only if the the id is set to  type uuid , but in case of
>>>> SolrCloud this id will be unique per replica.
>>>>
>>>> Is there a  way to generate a unique id both in case of SolrCloud with
>>> out
>>>> using the uuid type or not having a per replica unique id?
>>>>
>>>> The uuid in question is of type .
>>>>
>>>> <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
>>>>
>>>>
>>>> On Wed, Nov 12, 2014 at 6:20 PM, S.L <si...@gmail.com> wrote:
>>>>
>>>>> Thanks.
>>>>>
>>>>> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
>>>>> defined in my schema.xml.
>>>>>
>>>>> If along with that I also want the <id></id> field to be automatically
>>>>> generated for each document do I have to declare it as a <uniquekey> as
>>>>> well , because I just tried the following setting without the uniqueKey
>>> for
>>>>> id and its only generating blank ids for me.
>>>>>
>>>>> *schema.xml*
>>>>>
>>>>>       <field name="id" type="string" indexed="true" stored="true"
>>>>>           required="true" multiValued="false" />
>>>>>
>>>>> *solrconfig.xml*
>>>>>
>>>>>     <updateRequestProcessorChain name="uuid">
>>>>>
>>>>>       <processor class="solr.UUIDUpdateProcessorFactory">
>>>>>           <str name="fieldName">id</str>
>>>>>       </processor>
>>>>>       <processor class="solr.RunUpdateProcessorFactory" />
>>>>>   </updateRequestProcessorChain>
>>>>>
>>>>>
>>>>> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
>>>>> GarthGrimm@averyranchconsulting.com> wrote:
>>>>>
>>>>>> Looking a little deeper, I did find this about UUIDField
>>>>>>
>>>>>>
>>>>>>
>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>>>>>
>>>>>> "NOTE: Configuring a UUIDField instance with a default value of "NEW"
>>> is
>>>>>> not advisable for most users when using SolrCloud (and not possible if
>>> the
>>>>>> UUID value is configured as the unique key field) since the result
>>> will be
>>>>>> that each replica of each document will get a unique UUID value. Using
>>>>>> UUIDUpdateProcessorFactory<
>>>>>>
>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
>>>>
>>>>>> to generate UUID values when documents are added is recomended
>>> instead.”
>>>>>>
>>>>>> That might describe the behavior you saw.  And the use of
>>>>>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
>>> well
>>>>>> here:
>>>>>>
>>>>>>
>>>>>>
>>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>>>>>
>>>>>> Though I’ve not actually tried that process before.
>>>>>>
>>>>>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>>>>>> GarthGrimm@averyranchconsulting.com<mailto:
>>>>>> GarthGrimm@averyranchconsulting.com>> wrote:
>>>>>>
>>>>>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>>>>>
>>>>>> Generally, I’d stick with the out of the box advice of the schema.xml
>>>>>> file, which includes things like….
>>>>>>
>>>>>> <!-- Only remove the "id" field if you have a very good reason to.
>>>>>> While not strictly
>>>>>>   required, it is highly recommended. A <uniqueKey> is present in
>>>>>> almost all Solr
>>>>>>   installations. See the <uniqueKey> declaration below where
>>>>>> <uniqueKey> is set to "id".
>>>>>> -->
>>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>>> required="true" multiValued="false" />
>>>>>>
>>>>>> and…
>>>>>>
>>>>>> <!-- Field to use to determine and enforce document uniqueness.
>>>>>>    Unless this field is marked with required="false", it will be a
>>>>>> required field
>>>>>> -->
>>>>>> <uniqueKey>id</uniqueKey>
>>>>>>
>>>>>> If you’re creating some key/value pair with uuid as the key as you feed
>>>>>> documents in, and you know that the uuid values you’re creating are
>>> unique,
>>>>>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
>>>>>> change the key name you send in from ‘uuid’ to ‘id’.
>>>>>>
>>>>>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving016@gmail.com<mailto:
>>>>>> simpleliving016@gmail.com>> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I am seeing interesting behavior on the replicas , I have a single
>>>>>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>>>>>> number of documents ~375 that are replicated across the six replicas .
>>>>>>
>>>>>> The interesting thing is that the same  document has a different id in
>>>>>> each one of those replicas .
>>>>>>
>>>>>> This is causing the fq(id:xyz) type queries to fail, depending on
>>>>>> which replica the query goes to.
>>>>>>
>>>>>> I have  specified the id field in the following manner in schema.xml,
>>>>>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>>>>>
>>>>>>     <field name="id" type="uuid" indexed="true" stored="true"
>>>>>>         required="true" multiValued="false" />
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>

Re: Different ids for the same document in different replicas.

Posted by Garth Grimm <Ga...@averyranchconsulting.com>.
OK.  So it sounds like doctorURL is a good key, but you don’t like the special characters.  I’ve used MD5 hashes of URLs before as a way to convert unique URLs into unique alphanumeric strings in a repeatable way.  I think most programming languages contain libraries for doing that as you feed the data to Solr (Java certainly does).  Other hashing or encoding mechanisms could be used if you wanted to be able to programmatically convert from the doctorURL to the string you want to use and back again.

Anyway, the point there being that you have a repeatable unique key that is derived directly from the data you’re storing.  Not a random ID value that will be different every time you feed the same thing in.

BTW, you can certainly use a custom field type to do the hashing work, but I’d suggest you do that before feeding the data to SolrCloud.  If you do it outside of SolrCloud, then SolrCloud can use it for routing to the correct shard.  If you try to do it solely in a field type, the field type output won’t be available until the indexing is actually occurring, which is too late for routing purposes.  And that means you can’t ensure that subsequent re-feeds of the same thing will overwrite the old values since you can’t make sure they get routed to the same shard.

> On Nov 12, 2014, at 7:50 PM, Meraj A. Khan <me...@gmail.com> wrote:
> 
> Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
> mechanism because urls can have special characters that can caise issue
> with Solr lookup.
> 
> I guess I should rephrase my question to ,how to auto generate the unique
> keys in the id field when using SolrCloud?
> On Nov 12, 2014 7:28 PM, "Garth Grimm" <Ga...@averyranchconsulting.com>
> wrote:
> 
>> You mention you already have a unique Key identified for the data you’re
>> storing in Solr:
>> 
>>> <uniqueKey>doctorId<uniquekey>
>> 
>> If that’s the field you’re using to uniquely identify each thing you’re
>> storing in the solr index, why do you want to have an id field that is
>> populated with some random value?  You’ll be using the doctorId field as
>> the key, and the id field will have no real meaning in your Data Model.
>> 
>> If doctorId actually isn’t unique to each item you plan on storing in
>> Solr, is there any other field that is?  If so, use that field as your
>> unique key.
>> 
>> Remember, this uniqueKeys are usually used for routing documents to shards
>> in SolrCloud, and are used to ensure that later updates of the same “thing”
>> overwrite the old one, rather than generating multiple copies.  So the keys
>> really should be something derived from the data your storing.  I’m not
>> sure if I understand why you would want to have the key randomly generated.
>> 
>>> On Nov 12, 2014, at 6:39 PM, S.L <si...@gmail.com> wrote:
>>> 
>>> Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
>>> "string" only blank ids are being generated ,looks like the id is being
>>> auto generated only if the the id is set to  type uuid , but in case of
>>> SolrCloud this id will be unique per replica.
>>> 
>>> Is there a  way to generate a unique id both in case of SolrCloud with
>> out
>>> using the uuid type or not having a per replica unique id?
>>> 
>>> The uuid in question is of type .
>>> 
>>> <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
>>> 
>>> 
>>> On Wed, Nov 12, 2014 at 6:20 PM, S.L <si...@gmail.com> wrote:
>>> 
>>>> Thanks.
>>>> 
>>>> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
>>>> defined in my schema.xml.
>>>> 
>>>> If along with that I also want the <id></id> field to be automatically
>>>> generated for each document do I have to declare it as a <uniquekey> as
>>>> well , because I just tried the following setting without the uniqueKey
>> for
>>>> id and its only generating blank ids for me.
>>>> 
>>>> *schema.xml*
>>>> 
>>>>       <field name="id" type="string" indexed="true" stored="true"
>>>>           required="true" multiValued="false" />
>>>> 
>>>> *solrconfig.xml*
>>>> 
>>>>     <updateRequestProcessorChain name="uuid">
>>>> 
>>>>       <processor class="solr.UUIDUpdateProcessorFactory">
>>>>           <str name="fieldName">id</str>
>>>>       </processor>
>>>>       <processor class="solr.RunUpdateProcessorFactory" />
>>>>   </updateRequestProcessorChain>
>>>> 
>>>> 
>>>> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
>>>> GarthGrimm@averyranchconsulting.com> wrote:
>>>> 
>>>>> Looking a little deeper, I did find this about UUIDField
>>>>> 
>>>>> 
>>>>> 
>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>>>> 
>>>>> "NOTE: Configuring a UUIDField instance with a default value of "NEW"
>> is
>>>>> not advisable for most users when using SolrCloud (and not possible if
>> the
>>>>> UUID value is configured as the unique key field) since the result
>> will be
>>>>> that each replica of each document will get a unique UUID value. Using
>>>>> UUIDUpdateProcessorFactory<
>>>>> 
>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
>>> 
>>>>> to generate UUID values when documents are added is recomended
>> instead.”
>>>>> 
>>>>> That might describe the behavior you saw.  And the use of
>>>>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
>> well
>>>>> here:
>>>>> 
>>>>> 
>>>>> 
>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>>>> 
>>>>> Though I’ve not actually tried that process before.
>>>>> 
>>>>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>>>>> GarthGrimm@averyranchconsulting.com<mailto:
>>>>> GarthGrimm@averyranchconsulting.com>> wrote:
>>>>> 
>>>>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>>>> 
>>>>> Generally, I’d stick with the out of the box advice of the schema.xml
>>>>> file, which includes things like….
>>>>> 
>>>>> <!-- Only remove the "id" field if you have a very good reason to.
>>>>> While not strictly
>>>>>   required, it is highly recommended. A <uniqueKey> is present in
>>>>> almost all Solr
>>>>>   installations. See the <uniqueKey> declaration below where
>>>>> <uniqueKey> is set to "id".
>>>>> -->
>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>> required="true" multiValued="false" />
>>>>> 
>>>>> and…
>>>>> 
>>>>> <!-- Field to use to determine and enforce document uniqueness.
>>>>>    Unless this field is marked with required="false", it will be a
>>>>> required field
>>>>> -->
>>>>> <uniqueKey>id</uniqueKey>
>>>>> 
>>>>> If you’re creating some key/value pair with uuid as the key as you feed
>>>>> documents in, and you know that the uuid values you’re creating are
>> unique,
>>>>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
>>>>> change the key name you send in from ‘uuid’ to ‘id’.
>>>>> 
>>>>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving016@gmail.com<mailto:
>>>>> simpleliving016@gmail.com>> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I am seeing interesting behavior on the replicas , I have a single
>>>>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>>>>> number of documents ~375 that are replicated across the six replicas .
>>>>> 
>>>>> The interesting thing is that the same  document has a different id in
>>>>> each one of those replicas .
>>>>> 
>>>>> This is causing the fq(id:xyz) type queries to fail, depending on
>>>>> which replica the query goes to.
>>>>> 
>>>>> I have  specified the id field in the following manner in schema.xml,
>>>>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>>>> 
>>>>>     <field name="id" type="uuid" indexed="true" stored="true"
>>>>>         required="true" multiValued="false" />
>>>>> 
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: Different ids for the same document in different replicas.

Posted by "Meraj A. Khan" <me...@gmail.com>.
Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
mechanism because urls can have special characters that can caise issue
with Solr lookup.

I guess I should rephrase my question to ,how to auto generate the unique
keys in the id field when using SolrCloud?
 On Nov 12, 2014 7:28 PM, "Garth Grimm" <Ga...@averyranchconsulting.com>
wrote:

> You mention you already have a unique Key identified for the data you’re
> storing in Solr:
>
> > <uniqueKey>doctorId<uniquekey>
>
> If that’s the field you’re using to uniquely identify each thing you’re
> storing in the solr index, why do you want to have an id field that is
> populated with some random value?  You’ll be using the doctorId field as
> the key, and the id field will have no real meaning in your Data Model.
>
> If doctorId actually isn’t unique to each item you plan on storing in
> Solr, is there any other field that is?  If so, use that field as your
> unique key.
>
> Remember, this uniqueKeys are usually used for routing documents to shards
> in SolrCloud, and are used to ensure that later updates of the same “thing”
> overwrite the old one, rather than generating multiple copies.  So the keys
> really should be something derived from the data your storing.  I’m not
> sure if I understand why you would want to have the key randomly generated.
>
> > On Nov 12, 2014, at 6:39 PM, S.L <si...@gmail.com> wrote:
> >
> > Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
> > "string" only blank ids are being generated ,looks like the id is being
> > auto generated only if the the id is set to  type uuid , but in case of
> > SolrCloud this id will be unique per replica.
> >
> > Is there a  way to generate a unique id both in case of SolrCloud with
> out
> > using the uuid type or not having a per replica unique id?
> >
> > The uuid in question is of type .
> >
> > <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
> >
> >
> > On Wed, Nov 12, 2014 at 6:20 PM, S.L <si...@gmail.com> wrote:
> >
> >> Thanks.
> >>
> >> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
> >> defined in my schema.xml.
> >>
> >> If along with that I also want the <id></id> field to be automatically
> >> generated for each document do I have to declare it as a <uniquekey> as
> >> well , because I just tried the following setting without the uniqueKey
> for
> >> id and its only generating blank ids for me.
> >>
> >> *schema.xml*
> >>
> >>        <field name="id" type="string" indexed="true" stored="true"
> >>            required="true" multiValued="false" />
> >>
> >> *solrconfig.xml*
> >>
> >>      <updateRequestProcessorChain name="uuid">
> >>
> >>        <processor class="solr.UUIDUpdateProcessorFactory">
> >>            <str name="fieldName">id</str>
> >>        </processor>
> >>        <processor class="solr.RunUpdateProcessorFactory" />
> >>    </updateRequestProcessorChain>
> >>
> >>
> >> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
> >> GarthGrimm@averyranchconsulting.com> wrote:
> >>
> >>> Looking a little deeper, I did find this about UUIDField
> >>>
> >>>
> >>>
> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
> >>>
> >>> "NOTE: Configuring a UUIDField instance with a default value of "NEW"
> is
> >>> not advisable for most users when using SolrCloud (and not possible if
> the
> >>> UUID value is configured as the unique key field) since the result
> will be
> >>> that each replica of each document will get a unique UUID value. Using
> >>> UUIDUpdateProcessorFactory<
> >>>
> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
> >
> >>> to generate UUID values when documents are added is recomended
> instead.”
> >>>
> >>> That might describe the behavior you saw.  And the use of
> >>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
> well
> >>> here:
> >>>
> >>>
> >>>
> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
> >>>
> >>> Though I’ve not actually tried that process before.
> >>>
> >>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
> >>> GarthGrimm@averyranchconsulting.com<mailto:
> >>> GarthGrimm@averyranchconsulting.com>> wrote:
> >>>
> >>> “uuid” isn’t an out of the box field type that I’m familiar with.
> >>>
> >>> Generally, I’d stick with the out of the box advice of the schema.xml
> >>> file, which includes things like….
> >>>
> >>>  <!-- Only remove the "id" field if you have a very good reason to.
> >>> While not strictly
> >>>    required, it is highly recommended. A <uniqueKey> is present in
> >>> almost all Solr
> >>>    installations. See the <uniqueKey> declaration below where
> >>> <uniqueKey> is set to "id".
> >>>  -->
> >>>  <field name="id" type="string" indexed="true" stored="true"
> >>> required="true" multiValued="false" />
> >>>
> >>> and…
> >>>
> >>> <!-- Field to use to determine and enforce document uniqueness.
> >>>     Unless this field is marked with required="false", it will be a
> >>> required field
> >>>  -->
> >>> <uniqueKey>id</uniqueKey>
> >>>
> >>> If you’re creating some key/value pair with uuid as the key as you feed
> >>> documents in, and you know that the uuid values you’re creating are
> unique,
> >>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
> >>> change the key name you send in from ‘uuid’ to ‘id’.
> >>>
> >>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving016@gmail.com<mailto:
> >>> simpleliving016@gmail.com>> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I am seeing interesting behavior on the replicas , I have a single
> >>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
> >>> number of documents ~375 that are replicated across the six replicas .
> >>>
> >>> The interesting thing is that the same  document has a different id in
> >>> each one of those replicas .
> >>>
> >>> This is causing the fq(id:xyz) type queries to fail, depending on
> >>> which replica the query goes to.
> >>>
> >>> I have  specified the id field in the following manner in schema.xml,
> >>> is it the right way to specifiy an auto generated id in  SolrCloud ?
> >>>
> >>>      <field name="id" type="uuid" indexed="true" stored="true"
> >>>          required="true" multiValued="false" />
> >>>
> >>>
> >>> Thanks.
> >>>
> >>>
> >>>
> >>
>
>

Re: Different ids for the same document in different replicas.

Posted by Garth Grimm <Ga...@averyranchconsulting.com>.
You mention you already have a unique Key identified for the data you’re storing in Solr:

> <uniqueKey>doctorId<uniquekey>

If that’s the field you’re using to uniquely identify each thing you’re storing in the solr index, why do you want to have an id field that is populated with some random value?  You’ll be using the doctorId field as the key, and the id field will have no real meaning in your Data Model.

If doctorId actually isn’t unique to each item you plan on storing in Solr, is there any other field that is?  If so, use that field as your unique key.

Remember, this uniqueKeys are usually used for routing documents to shards in SolrCloud, and are used to ensure that later updates of the same “thing” overwrite the old one, rather than generating multiple copies.  So the keys really should be something derived from the data your storing.  I’m not sure if I understand why you would want to have the key randomly generated.

> On Nov 12, 2014, at 6:39 PM, S.L <si...@gmail.com> wrote:
> 
> Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
> "string" only blank ids are being generated ,looks like the id is being
> auto generated only if the the id is set to  type uuid , but in case of
> SolrCloud this id will be unique per replica.
> 
> Is there a  way to generate a unique id both in case of SolrCloud with out
> using the uuid type or not having a per replica unique id?
> 
> The uuid in question is of type .
> 
> <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
> 
> 
> On Wed, Nov 12, 2014 at 6:20 PM, S.L <si...@gmail.com> wrote:
> 
>> Thanks.
>> 
>> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
>> defined in my schema.xml.
>> 
>> If along with that I also want the <id></id> field to be automatically
>> generated for each document do I have to declare it as a <uniquekey> as
>> well , because I just tried the following setting without the uniqueKey for
>> id and its only generating blank ids for me.
>> 
>> *schema.xml*
>> 
>>        <field name="id" type="string" indexed="true" stored="true"
>>            required="true" multiValued="false" />
>> 
>> *solrconfig.xml*
>> 
>>      <updateRequestProcessorChain name="uuid">
>> 
>>        <processor class="solr.UUIDUpdateProcessorFactory">
>>            <str name="fieldName">id</str>
>>        </processor>
>>        <processor class="solr.RunUpdateProcessorFactory" />
>>    </updateRequestProcessorChain>
>> 
>> 
>> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
>> GarthGrimm@averyranchconsulting.com> wrote:
>> 
>>> Looking a little deeper, I did find this about UUIDField
>>> 
>>> 
>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>> 
>>> "NOTE: Configuring a UUIDField instance with a default value of "NEW" is
>>> not advisable for most users when using SolrCloud (and not possible if the
>>> UUID value is configured as the unique key field) since the result will be
>>> that each replica of each document will get a unique UUID value. Using
>>> UUIDUpdateProcessorFactory<
>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html>
>>> to generate UUID values when documents are added is recomended instead.”
>>> 
>>> That might describe the behavior you saw.  And the use of
>>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
>>> here:
>>> 
>>> 
>>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>> 
>>> Though I’ve not actually tried that process before.
>>> 
>>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>>> GarthGrimm@averyranchconsulting.com<mailto:
>>> GarthGrimm@averyranchconsulting.com>> wrote:
>>> 
>>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>> 
>>> Generally, I’d stick with the out of the box advice of the schema.xml
>>> file, which includes things like….
>>> 
>>>  <!-- Only remove the "id" field if you have a very good reason to.
>>> While not strictly
>>>    required, it is highly recommended. A <uniqueKey> is present in
>>> almost all Solr
>>>    installations. See the <uniqueKey> declaration below where
>>> <uniqueKey> is set to "id".
>>>  -->
>>>  <field name="id" type="string" indexed="true" stored="true"
>>> required="true" multiValued="false" />
>>> 
>>> and…
>>> 
>>> <!-- Field to use to determine and enforce document uniqueness.
>>>     Unless this field is marked with required="false", it will be a
>>> required field
>>>  -->
>>> <uniqueKey>id</uniqueKey>
>>> 
>>> If you’re creating some key/value pair with uuid as the key as you feed
>>> documents in, and you know that the uuid values you’re creating are unique,
>>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
>>> change the key name you send in from ‘uuid’ to ‘id’.
>>> 
>>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving016@gmail.com<mailto:
>>> simpleliving016@gmail.com>> wrote:
>>> 
>>> Hi All,
>>> 
>>> I am seeing interesting behavior on the replicas , I have a single
>>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>>> number of documents ~375 that are replicated across the six replicas .
>>> 
>>> The interesting thing is that the same  document has a different id in
>>> each one of those replicas .
>>> 
>>> This is causing the fq(id:xyz) type queries to fail, depending on
>>> which replica the query goes to.
>>> 
>>> I have  specified the id field in the following manner in schema.xml,
>>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>> 
>>>      <field name="id" type="uuid" indexed="true" stored="true"
>>>          required="true" multiValued="false" />
>>> 
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>> 


Re: Different ids for the same document in different replicas.

Posted by "S.L" <si...@gmail.com>.
Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
"string" only blank ids are being generated ,looks like the id is being
auto generated only if the the id is set to  type uuid , but in case of
SolrCloud this id will be unique per replica.

Is there a  way to generate a unique id both in case of SolrCloud with out
using the uuid type or not having a per replica unique id?

The uuid in question is of type .

<fieldType name="uuid" class="solr.UUIDField" indexed="true" />


On Wed, Nov 12, 2014 at 6:20 PM, S.L <si...@gmail.com> wrote:

> Thanks.
>
> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
> defined in my schema.xml.
>
> If along with that I also want the <id></id> field to be automatically
> generated for each document do I have to declare it as a <uniquekey> as
> well , because I just tried the following setting without the uniqueKey for
> id and its only generating blank ids for me.
>
> *schema.xml*
>
>         <field name="id" type="string" indexed="true" stored="true"
>             required="true" multiValued="false" />
>
> *solrconfig.xml*
>
>       <updateRequestProcessorChain name="uuid">
>
>         <processor class="solr.UUIDUpdateProcessorFactory">
>             <str name="fieldName">id</str>
>         </processor>
>         <processor class="solr.RunUpdateProcessorFactory" />
>     </updateRequestProcessorChain>
>
>
> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
> GarthGrimm@averyranchconsulting.com> wrote:
>
>> Looking a little deeper, I did find this about UUIDField
>>
>>
>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>
>> "NOTE: Configuring a UUIDField instance with a default value of "NEW" is
>> not advisable for most users when using SolrCloud (and not possible if the
>> UUID value is configured as the unique key field) since the result will be
>> that each replica of each document will get a unique UUID value. Using
>> UUIDUpdateProcessorFactory<
>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html>
>> to generate UUID values when documents are added is recomended instead.”
>>
>> That might describe the behavior you saw.  And the use of
>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
>> here:
>>
>>
>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>
>> Though I’ve not actually tried that process before.
>>
>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>> GarthGrimm@averyranchconsulting.com<mailto:
>> GarthGrimm@averyranchconsulting.com>> wrote:
>>
>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>
>> Generally, I’d stick with the out of the box advice of the schema.xml
>> file, which includes things like….
>>
>>   <!-- Only remove the "id" field if you have a very good reason to.
>> While not strictly
>>     required, it is highly recommended. A <uniqueKey> is present in
>> almost all Solr
>>     installations. See the <uniqueKey> declaration below where
>> <uniqueKey> is set to "id".
>>   -->
>>   <field name="id" type="string" indexed="true" stored="true"
>> required="true" multiValued="false" />
>>
>> and…
>>
>> <!-- Field to use to determine and enforce document uniqueness.
>>      Unless this field is marked with required="false", it will be a
>> required field
>>   -->
>> <uniqueKey>id</uniqueKey>
>>
>> If you’re creating some key/value pair with uuid as the key as you feed
>> documents in, and you know that the uuid values you’re creating are unique,
>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
>> change the key name you send in from ‘uuid’ to ‘id’.
>>
>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving016@gmail.com<mailto:
>> simpleliving016@gmail.com>> wrote:
>>
>> Hi All,
>>
>> I am seeing interesting behavior on the replicas , I have a single
>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>> number of documents ~375 that are replicated across the six replicas .
>>
>> The interesting thing is that the same  document has a different id in
>> each one of those replicas .
>>
>> This is causing the fq(id:xyz) type queries to fail, depending on
>> which replica the query goes to.
>>
>> I have  specified the id field in the following manner in schema.xml,
>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>
>>       <field name="id" type="uuid" indexed="true" stored="true"
>>           required="true" multiValued="false" />
>>
>>
>> Thanks.
>>
>>
>>
>

Re: Different ids for the same document in different replicas.

Posted by "S.L" <si...@gmail.com>.
Thanks.

So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
defined in my schema.xml.

If along with that I also want the <id></id> field to be automatically
generated for each document do I have to declare it as a <uniquekey> as
well , because I just tried the following setting without the uniqueKey for
id and its only generating blank ids for me.

*schema.xml*

        <field name="id" type="string" indexed="true" stored="true"
            required="true" multiValued="false" />

*solrconfig.xml*

      <updateRequestProcessorChain name="uuid">

        <processor class="solr.UUIDUpdateProcessorFactory">
            <str name="fieldName">id</str>
        </processor>
        <processor class="solr.RunUpdateProcessorFactory" />
    </updateRequestProcessorChain>


On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
GarthGrimm@averyranchconsulting.com> wrote:

> Looking a little deeper, I did find this about UUIDField
>
>
> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>
> "NOTE: Configuring a UUIDField instance with a default value of "NEW" is
> not advisable for most users when using SolrCloud (and not possible if the
> UUID value is configured as the unique key field) since the result will be
> that each replica of each document will get a unique UUID value. Using
> UUIDUpdateProcessorFactory<
> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html>
> to generate UUID values when documents are added is recomended instead.”
>
> That might describe the behavior you saw.  And the use of
> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
> here:
>
>
> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>
> Though I’ve not actually tried that process before.
>
> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
> GarthGrimm@averyranchconsulting.com<mailto:
> GarthGrimm@averyranchconsulting.com>> wrote:
>
> “uuid” isn’t an out of the box field type that I’m familiar with.
>
> Generally, I’d stick with the out of the box advice of the schema.xml
> file, which includes things like….
>
>   <!-- Only remove the "id" field if you have a very good reason to. While
> not strictly
>     required, it is highly recommended. A <uniqueKey> is present in almost
> all Solr
>     installations. See the <uniqueKey> declaration below where <uniqueKey>
> is set to "id".
>   -->
>   <field name="id" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>
> and…
>
> <!-- Field to use to determine and enforce document uniqueness.
>      Unless this field is marked with required="false", it will be a
> required field
>   -->
> <uniqueKey>id</uniqueKey>
>
> If you’re creating some key/value pair with uuid as the key as you feed
> documents in, and you know that the uuid values you’re creating are unique,
> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
> change the key name you send in from ‘uuid’ to ‘id’.
>
> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving016@gmail.com<mailto:
> simpleliving016@gmail.com>> wrote:
>
> Hi All,
>
> I am seeing interesting behavior on the replicas , I have a single
> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
> number of documents ~375 that are replicated across the six replicas .
>
> The interesting thing is that the same  document has a different id in
> each one of those replicas .
>
> This is causing the fq(id:xyz) type queries to fail, depending on
> which replica the query goes to.
>
> I have  specified the id field in the following manner in schema.xml,
> is it the right way to specifiy an auto generated id in  SolrCloud ?
>
>       <field name="id" type="uuid" indexed="true" stored="true"
>           required="true" multiValued="false" />
>
>
> Thanks.
>
>
>

Re: Different ids for the same document in different replicas.

Posted by Garth Grimm <Ga...@averyranchconsulting.com>.
Looking a little deeper, I did find this about UUIDField

http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html

"NOTE: Configuring a UUIDField instance with a default value of "NEW" is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that each replica of each document will get a unique UUID value. Using UUIDUpdateProcessorFactory<http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html> to generate UUID values when documents are added is recomended instead.”

That might describe the behavior you saw.  And the use of UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here:

http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/

Though I’ve not actually tried that process before.

On Nov 11, 2014, at 7:39 PM, Garth Grimm <Ga...@averyranchconsulting.com>> wrote:

“uuid” isn’t an out of the box field type that I’m familiar with.

Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like….

  <!-- Only remove the "id" field if you have a very good reason to. While not strictly
    required, it is highly recommended. A <uniqueKey> is present in almost all Solr
    installations. See the <uniqueKey> declaration below where <uniqueKey> is set to "id".
  -->
  <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

and…

<!-- Field to use to determine and enforce document uniqueness.
     Unless this field is marked with required="false", it will be a required field
  -->
<uniqueKey>id</uniqueKey>

If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’.  Or change the key name you send in from ‘uuid’ to ‘id’.

On Nov 11, 2014, at 7:18 PM, S.L <si...@gmail.com>> wrote:

Hi All,

I am seeing interesting behavior on the replicas , I have a single
shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
number of documents ~375 that are replicated across the six replicas .

The interesting thing is that the same  document has a different id in
each one of those replicas .

This is causing the fq(id:xyz) type queries to fail, depending on
which replica the query goes to.

I have  specified the id field in the following manner in schema.xml,
is it the right way to specifiy an auto generated id in  SolrCloud ?

      <field name="id" type="uuid" indexed="true" stored="true"
          required="true" multiValued="false" />


Thanks.



Re: Different ids for the same document in different replicas.

Posted by Garth Grimm <Ga...@averyranchconsulting.com>.
“uuid” isn’t an out of the box field type that I’m familiar with.

Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like….

   <!-- Only remove the "id" field if you have a very good reason to. While not strictly
     required, it is highly recommended. A <uniqueKey> is present in almost all Solr 
     installations. See the <uniqueKey> declaration below where <uniqueKey> is set to "id".
   -->   
   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 

and…

 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a required field
   -->
 <uniqueKey>id</uniqueKey>

If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’.  Or change the key name you send in from ‘uuid’ to ‘id’.

On Nov 11, 2014, at 7:18 PM, S.L <si...@gmail.com> wrote:

> Hi All,
> 
> I am seeing interesting behavior on the replicas , I have a single
> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
> number of documents ~375 that are replicated across the six replicas .
> 
> The interesting thing is that the same  document has a different id in
> each one of those replicas .
> 
> This is causing the fq(id:xyz) type queries to fail, depending on
> which replica the query goes to.
> 
> I have  specified the id field in the following manner in schema.xml,
> is it the right way to specifiy an auto generated id in  SolrCloud ?
> 
>        <field name="id" type="uuid" indexed="true" stored="true"
>            required="true" multiValued="false" />
> 
> 
> Thanks.