You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Bernd Lutz <ib...@googlemail.com> on 2009/12/30 20:01:08 UTC

Handling unique URL friendly identifiers

Hello everyone,

I'm currently exploring the possibility to port an app from MySQL to CouchDB. The app uses friendly URLs, e.g. http://myapp.com/content/friendly-identifier. Every content has an unique identifier which derives from the content title.

If there would be only a few users creating content a simple check for the identifier's uniqueness would be enough. But what if there are multiple database nodes and let's say thousand users creating identifiers.
So we cannot get a consistent view result on a single node nor being sure that another user doesn't add a same identifier on another node in parallel.

I hope you got the problem. How would you handle it?

My approach:
Setting up a special node and database handling idenifiers with a view having the identifier as key.
1. Adding a new document with the identifier.
2. Check the uniqueness by queriing the view.
3. If ok, all right. Otherwise: Recursively add a postfix "-n" until the result is unique.

Of course the identifier won't be used as document _id.
I think this is a common problem not only for url friendly identifiers. A common usage could be also the unique user name.

Best regards and a happy new year,
Bernd

Re: Handling unique URL friendly identifiers

Posted by Chris Anderson <jc...@apache.org>.

On Thu, Dec 31, 2009 at 12:28 AM, Bernd Lutz <ib...@googlemail.com> wrote:
>>
>> I think it's actually good practice to use docids for friendly identifiers. Then collisions between URL slugs will appear as couchdb conflicts (eventually, in the case of distributed app, and immediately, in the case of a single node or partitioned cluster).
>>
>
> At first I thought this, too. But this would mean documents have to be deleted and then re added for updates. _ids used to realize a sort of relation have to be changed, too - like comments in a blog. So I thought it might be better to avoid further problems by not using url friendly identifiers as _ids.

I've been happy enough to live without changeable ids. I've even done
embarrassing stuff like misspelled a word in a blog post title, which
I then fixed in doc.title, but it is still wrong in doc._id (and thus
the URL). Details like this I suppose are a matter of how much the
business case can be bent around the technology, and vice versa.

Also, bear in mind that 99% of the time (in apps which don't do a
whole lot of disconnected operation) when there is an _id collision it
will manifest itself before the document is saved anywhere.

If you are building a truly p2p app where no single entity will be in
control of the "real" state of the data, then you might be better off
using uuids. (But then you'd better resign yourself to not having
uniqueness constraints of any sort...)

>
> But I see the point: Better error handling. Additionally it might be useful to add a prefix to distinguish between content types and being able to have the same identifiers for different content types.
>

Yes, a prefix is sensible in most cases.

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: Handling unique URL friendly identifiers

Posted by Bernd Lutz <ib...@googlemail.com>.

> 
> I think it's actually good practice to use docids for friendly identifiers. Then collisions between URL slugs will appear as couchdb conflicts (eventually, in the case of distributed app, and immediately, in the case of a single node or partitioned cluster).
> 

At first I thought this, too. But this would mean documents have to be deleted and then re added for updates. _ids used to realize a sort of relation have to be changed, too - like comments in a blog. So I thought it might be better to avoid further problems by not using url friendly identifiers as _ids.

But I see the point: Better error handling. Additionally it might be useful to add a prefix to distinguish between content types and being able to have the same identifiers for different content types.

Re: Handling unique URL friendly identifiers

Posted by Chris Anderson <jc...@gmail.com>.


Sent from my iPhone

On Dec 30, 2009, at 11:01 AM, Bernd Lutz <ib...@googlemail.com>  
wrote:

> Hello everyone,
>
> I'm currently exploring the possibility to port an app from MySQL to  
> CouchDB. The app uses friendly URLs, e.g. http://myapp.com/content/friendly-identifier 
> . Every content has an unique identifier which derives from the  
> content title.
>
> If there would be only a few users creating content a simple check  
> for the identifier's uniqueness would be enough. But what if there  
> are multiple database nodes and let's say thousand users creating  
> identifiers.
> So we cannot get a consistent view result on a single node nor being  
> sure that another user doesn't add a same identifier on another node  
> in parallel.
>
> I hope you got the problem. How would you handle it?
>
> My approach:
> Setting up a special node and database handling idenifiers with a  
> view having the identifier as key.
> 1. Adding a new document with the identifier.
> 2. Check the uniqueness by queriing the view.
> 3. If ok, all right. Otherwise: Recursively add a postfix "-n" until  
> the result is unique.
>
> Of course the identifier won't be used as document _id.
> I think this is a common problem not only for url friendly  
> identifiers. A common usage could be also the unique user name.
>

I think it's actually good practice to use docids for friendly  
identifiers. Then collisions between URL slugs will appear as couchdb  
conflicts (eventually, in the case of distributed app, and  
immediately, in the case of a single node or partitioned cluster).

> Best regards and a happy new year,
> Bernd