You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Derek Poh <dp...@globalsources.com> on 2017/05/22 06:25:30 UTC

different length/size of unique 'id' field value in a collection.

Due to the source data structure, I need to concatenate the values of 2
fields ('supplier_id' and 'product_id') to form the unique 'id' of each
document.
However there are cases where some documents only have 'supplier_id' field.
This will result in some documents with a longer/larger 'id' field (have
both 'supplier_id' and 'product_id') and some with a shorter/smaller
'id' field value (has only 'supplier_id').

Please refer to simplified representation of the records below.
3rd record only has supplier id .
ts1 sup1 pdt1
ts1 sup1 pdt2
ts1 sup2
ts1 sup3 pdt3
ts1 sup4 pdt5
ts1 sup4 pdt6

I understand the unique 'id' is use during indexing to check whether a
document already exists. Create if it does not exists else update if it
exists.

Are there any implications if the unique 'id' field value is of
different size/length among documents of a collection?
Is it advisable to have such design?

Derek

----------------------
CONFIDENTIALITY NOTICE

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.

This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.

Re: different length/size of unique 'id' field value in a collection.

Posted by Rick Leir <rl...@leirtech.com>.

Derek,
If your algorithm is guaranteed to always provide unique id's then fine. I say incorrectly in that, after a few years in software development, I have seen bugs in the most careful code. A bug causing ID collisions could be hard to track down. Solr can generate unique ID's for you, and you can index your product ID's in normal fields, so that is my preference. Just a preference.
Cheers -- Rick

On May 22, 2017 10:07:36 PM EDT, Derek Poh <dp...@globalsources.com> wrote:
>Hi Rick
>
>Myapologies I didnot make myself clearon the value of the fields. There
>
>are numbers.
>I used 'ts1', 'sup1' and 'pdt1' for simplicity and for ease of 
>understanding instead of the actual numbers.
>
>You mentioned this design has the potential for (in error cases) 
>concatenating id's incorrectly. Could explain more on this?
>
>On 5/22/2017 6:12 PM, Rick Leir wrote:
>> On 2017-05-22 02:25 AM, Derek Poh wrote:
>>> Hi
>>>
>>> Due to the source data structure, I need to concatenate the values
>of 
>>> 2 fields ('supplier_id' and 'product_id') to form the unique 'id' of
>
>>> each document.
>>> However there are cases where some documents only have 'supplier_id'
>
>>> field.
>>> This will result in some documents with a longer/larger 'id' field 
>>> (have both 'supplier_id' and 'product_id') and some with a 
>>> shorter/smaller 'id' field value (has only 'supplier_id').
>>>
>>> Please refer to simplified representation of the records below.
>>> 3rd record only has supplier id .
>>> ts1 sup1 pdt1
>>> ts1 sup1 pdt2
>>> ts1 sup2
>>> ts1 sup3 pdt3
>>> ts1 sup4 pdt5
>>> ts1 sup4 pdt6
>>>
>>> I understand the unique 'id' is use during indexing to check whether
>
>>> a document already exists. Create if it does not exists else update 
>>> if it exists.
>>>
>>> Are there any implications if the unique 'id' field value is of 
>>> different size/length among documents of a collection?
>> No
>>> Is it advisable to have such design?
>> Derek
>> You need unique ID's. This design has the potential for (in error 
>> cases) concatenating id's incorrectly. It might be better to have
>ID's 
>> which are just a number. That said, my current project has ID's which
>
>> are not just a number, YMMV.
>> cheers -- Rick
>>>
>>> Derek
>>
>>
>
>
>----------------------
>CONFIDENTIALITY NOTICE 
>
>This e-mail (including any attachments) may contain confidential and/or
>privileged information. If you are not the intended recipient or have
>received this e-mail in error, please inform the sender immediately and
>delete this e-mail (including any attachments) from your computer, and
>you must not use, disclose to anyone else or copy this e-mail
>(including any attachments), whether in whole or in part. 
>
>This e-mail and any reply to it may be monitored for security, legal,
>regulatory compliance and/or other appropriate reasons.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: different length/size of unique 'id' field value in a collection.

Posted by Derek Poh <dp...@globalsources.com>.

Hi Rick

Myapologies I didnot make myself clearon the value of the fields. There 
are numbers.
I used 'ts1', 'sup1' and 'pdt1' for simplicity and for ease of 
understanding instead of the actual numbers.

You mentioned this design has the potential for (in error cases) 
concatenating id's incorrectly. Could explain more on this?

On 5/22/2017 6:12 PM, Rick Leir wrote:
> On 2017-05-22 02:25 AM, Derek Poh wrote:
>> Hi
>>
>> Due to the source data structure, I need to concatenate the values of 
>> 2 fields ('supplier_id' and 'product_id') to form the unique 'id' of 
>> each document.
>> However there are cases where some documents only have 'supplier_id' 
>> field.
>> This will result in some documents with a longer/larger 'id' field 
>> (have both 'supplier_id' and 'product_id') and some with a 
>> shorter/smaller 'id' field value (has only 'supplier_id').
>>
>> Please refer to simplified representation of the records below.
>> 3rd record only has supplier id .
>> ts1 sup1 pdt1
>> ts1 sup1 pdt2
>> ts1 sup2
>> ts1 sup3 pdt3
>> ts1 sup4 pdt5
>> ts1 sup4 pdt6
>>
>> I understand the unique 'id' is use during indexing to check whether 
>> a document already exists. Create if it does not exists else update 
>> if it exists.
>>
>> Are there any implications if the unique 'id' field value is of 
>> different size/length among documents of a collection?
> No
>> Is it advisable to have such design?
> Derek
> You need unique ID's. This design has the potential for (in error 
> cases) concatenating id's incorrectly. It might be better to have ID's 
> which are just a number. That said, my current project has ID's which 
> are not just a number, YMMV.
> cheers -- Rick
>>
>> Derek
>
>

----------------------
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.

Re: different length/size of unique 'id' field value in a collection.

Posted by Rick Leir <rl...@leirtech.com>.

On 2017-05-22 02:25 AM, Derek Poh wrote:
> Hi
>
> Due to the source data structure, I need to concatenate the values of 
> 2 fields ('supplier_id' and 'product_id') to form the unique 'id' of 
> each document.
> However there are cases where some documents only have 'supplier_id' 
> field.
> This will result in some documents with a longer/larger 'id' field 
> (have both 'supplier_id' and 'product_id') and some with a 
> shorter/smaller 'id' field value (has only 'supplier_id').
>
> Please refer to simplified representation of the records below.
> 3rd record only has supplier id .
> ts1 sup1 pdt1
> ts1 sup1 pdt2
> ts1 sup2
> ts1 sup3 pdt3
> ts1 sup4 pdt5
> ts1 sup4 pdt6
>
> I understand the unique 'id' is use during indexing to check whether a 
> document already exists. Create if it does not exists else update if 
> it exists.
>
> Are there any implications if the unique 'id' field value is of 
> different size/length among documents of a collection?
No
> Is it advisable to have such design?
Derek
You need unique ID's. This design has the potential for (in error cases) 
concatenating id's incorrectly. It might be better to have ID's which 
are just a number. That said, my current project has ID's which are not 
just a number, YMMV.
cheers -- Rick
>
> Derek