You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Alexander Harm <co...@aharm.de> on 2016/01/05 10:09:48 UTC
Re: Document ID naming: random UUIDs or structured?

Hello Ronan,

my two cents:

I tend to incorporate the type and possible parent into my id, so in your case that would look like

case_1234
finding_1234_f2ac2351
finding_1234_aa928399
note_1234_22933cf5
measure_1234_928dca87

However, I tend to normalise the type and all “ids" into a fixed length e. g. 
case_1234
fndg_1234_f2ac2351
fndg_1234_aa928399
note_1234_22933cf5
msre_1234_928dca87

That enables me to pull an overview of all cases with all docs
startkey case_
endkey case_\uffff
and then access all details by type
startkey fndg_1234_
endkey fndg_1234_\uffff

That works pretty well for my use case (querying all cases and details only when needed). By adding the type to the start I make sure the docs are stored in order (your 3.1 c). Whether or not to use UUID depends. In the example of a people directory each person has a unique incremental UUID:
person_<person-uuid>
the telephone number could be shortened to the type
telphn_<person-uuid>_home
telphn_<person-uuid>_work
telphn_<person-uuid>_fax
telphn_<person-uuid>_mobile

If there is a chance of conflicts I would always go for a UUID.

Regards,

Alexander





> On 24. Dec. 2015, at 20:05, Ronan Jouchet <ro...@cadensimaging.com> wrote:
> 
> Hi.
> 
> I'm coming back on an already much debated subject, with a few questions I couldn't find answers for.
> 
> I started working on a new system backed by CouchDB, and am questioning our choice to use "meaningful"/structured IDs (as opposed to UUIDs). Our data revolves around documents called "cases", which can relate to various documents, like notes, findings, measures. So we build IDs looking like:
> - 1234_case
> - 1234_finding_f2ac2351
> - 1234_finding_aa928399
> - 1234_note_22933cf5
> - 1234_measure_928dca87
> 
> Colleagues say they initially went for UUIDs, then moved on to a meaningful scheme for guess-ability, which enabled easier replication, as well as a few views referencing IDs (thanks to knowledge of the naming structure), which expand to full documents with include_docs=true.
> 
> On my side, as a NoSQL freshman and without the project history, I can't help wanting to move back to UUIDs, because:
> 
> 1. As we're leaning heavily on the *naming* of our documents, I have the feeling we're hiding ourselves we're not properly structuring our data in a way that is view-friendly. Feels like it's going to come back and bite us later on.
> 
> 2. As we are adding logic, we're starting to see unwieldy IDs (hash1_thing1_hash2_thing2_hash3_thing3_hash4)
> 
> 3. Currently, the information contained in the ID (in the above example: caseId, type, hash) is currently *only* here. So to "extract" this information we have repetitive-but-slightly-different "splitId" functions that extract and type these ids (for example: "1234_finding_f2ac2351" -> {"caseId": 1234, "type": "finding", "contentId": "f2ac2351"}, which is painful.
> 
>   3.1. The obvious solution is be to repeat {caseId, type, hash} as document properties. Then I can use them without having to call splitId(doc._id). But then there's duplicated data, which will have to be updated jointly. Is it a problem or is it just the time for me to learn to stop worrying and not care about this kind of minor duplication in NoSQL land?
> 
> Then, looking at what the internet says (see references below),
> 
> a. Both [PDB] and [DC] say non-uuid IDs are convenient for bare-bones _all_docs querying (e.g. for "all of Bob Dylan's albums released between 1964 and 1965", just {startkey: 'album_dylan_1965_', endkey: 'album_dylan_1964_\uffff'}).
> True, but how often will I be able to use such simple queries? I feel like I'm going to need views anyway.
> 
> b. Both [PDB] and [DC] say that a structured ID naming means usable indexes "for free", taking no additional space compared to a solution with random UUIDs complemented with views.
>  - Also, both note that using UUIDs (thus, needing views) means failing to use the built-anyway index on _id. True.
>  - [DC] goes as far as saying that "getting rid of as many views (relying on _all_docs instead) as you can is a worthwhile goal". Is this a shared opinion?
> 
> c. [INOI] and [GUIDE] note that incremental IDs will yield better performance on bulk document inserts. Okay.
> 
> d. [SO] proposes to "use UUIDs unless you have a good reason not to", and recommends to base your choice on "Cost of changing ID vs. How likely the ID is to change" (if the ID is likely to change a lot, use a UUID to force yourself to not rely on it).
> 
> What do you think? What do you use in your own projects?
> 
> Thanks for your help, thanks for CouchDB, and happy end-of-year :)
> 
> References ----
> 
> [PDB] (section "Use and abuse your doc IDs") http://pouchdb.com/2014/05/01/secondary-indexes-have-landed-in-pouchdb.html
> 
> [DC] http://davidcaylor.com/2012/05/26/can-i-see-your-id-please-the-importance-of-couchdb-record-ids/
> 
> [GUIDE] http://guide.couchdb.org/draft/performance.html#bulk
> 
> [INOI] http://blog.inoi.fi/2010/11/impact-of-document-ids-on-performance.html
> 
> [SO] http://stackoverflow.com/questions/1963632/what-is-best-practice-when-creating-document-ids-in-couchdb/1964947#1964947
> 
> -- 
> Ronan