You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Bernhard Gschwantner <be...@unserwein.at> on 2012/07/12 13:07:12 UTC

Re: Data modelling

Hi,

That's more a question for the couchdb users list - I posted it also there.
(not sure if that's the right way to do it, though)

In the case of bank accounts and transactions, I would go for the second
approach. CouchDB can happily handle millions of documents, and incremental
map/reduce for views makes sure that unchanged transaction documents need
not be calculated again.

Account transactions are what the name says: transactions, and as such,
they should be atomic. So if a transaction involves two accounts, then I
would put both account numbers into the transaction document:

{"_id": "transaction-key-1", ..., "bank-account-from":
"bank-account-1", "bank-account-to": "bank-account-2"}

With that construction, you can be sure that it will never happen that you
deduct an amount from one account and never add it to the other. If you
make this two transaction documents, It could be that you try to create
both documents, but only one succeeds. Or if you replicate your database,
the first one is replicated, and then the network connection goes down.

Try to design your documents for eventual consistency,  so that you can
avoid situations that could bring your database into an inconsistent state
if some documents are missing.


If you want to archive old transactions (they should never change anyway),
you could do some processing and create account balances directly in the
bank account documents (but keep in mind that this _could_ make your data
inconsistent). One solution could be
{"_id": "bank-account-1", ..., "cached-balance": 1234.56,
"cached-balance-until": "2012-02-01T00:00:00.000Z"}

Then you could set an "archived" flag on all transactions older than this
date (if you have two accounts as proposed, you have to set two archive
flags). You could then do a partial replication to a new database where you
filter out any transactions that have the archived flag on both
transactions, Tip: NEVER delete those transactions, not even if they are
archived in another database. This WILL eventually lead to disaster ;-)

Hope I could help!

Cheers,
Bernhard

2012/7/12 Samuel Williams <sp...@gmail.com>

> Hi,
>
> I need to model some relationships.
>
> (1) With a one-to-many relationship, should I keep the keys of the many
> objects in the one, or for each many object have the key of the one, or
> both?
>
> e.g.
>
> {"_id": "bank-account-1", ..., transactions: ["transaction-key-1",
> "transaction-key-2", "transaction-key-3"]}
> {"_id": "transaction-key-1", ...}
> {"_id": "transaction-key-2", ...}
> {"_id": "transaction-key-3", ...}
>
> OR
>
> {"_id": "bank-account-1", ...}
> {"_id": "transaction-key-1", ..., "bank-account": "bank-account-1"}
> {"_id": "transaction-key-2", ..., "bank-account": "bank-account-1"}
> {"_id": "transaction-key-3", ..., "bank-account": "bank-account-1"}
>
> Both are a possibility. However, with the first approach, the document may
> get very large. This leads into my next question:
>
> (2) W.r.t. denormalizing data - sometimes it naturally makes sense, e.g.
> invoice with multiple line items stored in an array vs invoice with
> multiple line item documents.
>
> With an invoice, once you make the invoice it doesn't usually change that
> much and you also want the data together when you access it, e.g. to
> display, however with a bank account the transactions are changing
> frequently (e.g. new transactions every day).
>
> I was wondering if I should consider batching together transactions, e.g. a
> document for each month's worth of transactions, or perhaps one per year,
> etc. That way when modifying transactions, you don't need to specify the
> entire huge document with 1000s of individual items.
>
> I would appreciate any feedback and suggestions/advice.
>
> Thanks
> Samuel
>



-- 

Bernhard Gschwantner
Unser Wein G&U OG
Kirchengasse 13/7, 1070 Wien

mobil: +43 (6991) 971 32 96
tel: +43 (1) 971 32 95
e-mail: bernhard@unserwein.at
twitter: @bernharduw <http://twitter.com/bernharduw>
web: www.unserwein.at

Re: Data modelling

Posted by Robert Newson <rn...@apache.org>.
It's not a transaction in any useful sense.

B.

On 12 Jul 2012, at 14:47, Matthieu Rakotojaona wrote:

> Just a precision :
> 
> On Thu, Jul 12, 2012 at 1:07 PM, Bernhard Gschwantner
> <be...@unserwein.at> wrote:
>> With that construction, you can be sure that it will never happen that you
>> deduct an amount from one account and never add it to the other. If you
>> make this two transaction documents, It could be that you try to create
>> both documents, but only one succeeds. Or if you replicate your database,
>> the first one is replicated, and then the network connection goes down.
> 
> You can use the _bulk_docs endpoint with the all_or_nothing option, to
> use some kind of transaction :
> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Transactional_Semantics_with_Bulk_Updates
> 
> I know that replication uses the bulk API for putting the docs on the
> target, but I don't know if it includes this option.
> 
> -- 
> Matthieu RAKOTOJAONA


Re: Data modelling

Posted by Dave Cottlehuber <da...@muse.net.nz>.
On 12 July 2012 15:47, Matthieu Rakotojaona
<ma...@gmail.com> wrote:
> Just a precision :
>
> On Thu, Jul 12, 2012 at 1:07 PM, Bernhard Gschwantner
> <be...@unserwein.at> wrote:
>> With that construction, you can be sure that it will never happen that you
>> deduct an amount from one account and never add it to the other. If you
>> make this two transaction documents, It could be that you try to create
>> both documents, but only one succeeds. Or if you replicate your database,
>> the first one is replicated, and then the network connection goes down.
>
> You can use the _bulk_docs endpoint with the all_or_nothing option, to
> use some kind of transaction :
> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Transactional_Semantics_with_Bulk_Updates
>
> I know that replication uses the bulk API for putting the docs on the
> target, but I don't know if it includes this option.
>
> --
> Matthieu RAKOTOJAONA

Matthieu,

Yes this is possible on a single instance but you can't guarantee
anything greater than a single document transaction as soon as you
subsequently use _changes, replication or views on top of that DB.

A+
Dave

Re: Data modelling

Posted by Matthieu Rakotojaona <ma...@gmail.com>.
Just a precision :

On Thu, Jul 12, 2012 at 1:07 PM, Bernhard Gschwantner
<be...@unserwein.at> wrote:
> With that construction, you can be sure that it will never happen that you
> deduct an amount from one account and never add it to the other. If you
> make this two transaction documents, It could be that you try to create
> both documents, but only one succeeds. Or if you replicate your database,
> the first one is replicated, and then the network connection goes down.

You can use the _bulk_docs endpoint with the all_or_nothing option, to
use some kind of transaction :
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Transactional_Semantics_with_Bulk_Updates

I know that replication uses the bulk API for putting the docs on the
target, but I don't know if it includes this option.

-- 
Matthieu RAKOTOJAONA

Re: Data modelling

Posted by Samuel Williams <sp...@gmail.com>.
Thanks, what you've said makes a lot of sense.

So, w.r.t. invoices, I get the feeling that it would be best to store all
items in a single document as that avoids inconsistency related issues..?
It also feels right since you'd be highly unlikely to have more than, say,
50 items on an invoice. Finally, when you display an invoice you almost
always want to display all items, so requesting individual documents seems
less efficient.

But, with bank accounts, transactions are sort of documents in their own
right.. meaning that it is ideally best to keep them separate, and you
might not want to view them all, or you might keep a running total. I was
thinking to use statement objects which exist and keep track of credits,
debits and balance for a given period - along with individual transaction
documents.

I guess I'm just trying to figure out where and when to use a traditional
normalised approach ala typical SQL tables vs denormalized approach ala
typical document storage.

Kind regards,
Samuel

On 12 July 2012 23:07, Bernhard Gschwantner <be...@unserwein.at> wrote:

> Hi,
>
> That's more a question for the couchdb users list - I posted it also there.
> (not sure if that's the right way to do it, though)
>
> In the case of bank accounts and transactions, I would go for the second
> approach. CouchDB can happily handle millions of documents, and incremental
> map/reduce for views makes sure that unchanged transaction documents need
> not be calculated again.
>
> Account transactions are what the name says: transactions, and as such,
> they should be atomic. So if a transaction involves two accounts, then I
> would put both account numbers into the transaction document:
>
> {"_id": "transaction-key-1", ..., "bank-account-from":
> "bank-account-1", "bank-account-to": "bank-account-2"}
>
> With that construction, you can be sure that it will never happen that you
> deduct an amount from one account and never add it to the other. If you
> make this two transaction documents, It could be that you try to create
> both documents, but only one succeeds. Or if you replicate your database,
> the first one is replicated, and then the network connection goes down.
>
> Try to design your documents for eventual consistency,  so that you can
> avoid situations that could bring your database into an inconsistent state
> if some documents are missing.
>
>
> If you want to archive old transactions (they should never change anyway),
> you could do some processing and create account balances directly in the
> bank account documents (but keep in mind that this _could_ make your data
> inconsistent). One solution could be
> {"_id": "bank-account-1", ..., "cached-balance": 1234.56,
> "cached-balance-until": "2012-02-01T00:00:00.000Z"}
>
> Then you could set an "archived" flag on all transactions older than this
> date (if you have two accounts as proposed, you have to set two archive
> flags). You could then do a partial replication to a new database where you
> filter out any transactions that have the archived flag on both
> transactions, Tip: NEVER delete those transactions, not even if they are
> archived in another database. This WILL eventually lead to disaster ;-)
>
> Hope I could help!
>
> Cheers,
> Bernhard
>
> 2012/7/12 Samuel Williams <sp...@gmail.com>
>
> > Hi,
> >
> > I need to model some relationships.
> >
> > (1) With a one-to-many relationship, should I keep the keys of the many
> > objects in the one, or for each many object have the key of the one, or
> > both?
> >
> > e.g.
> >
> > {"_id": "bank-account-1", ..., transactions: ["transaction-key-1",
> > "transaction-key-2", "transaction-key-3"]}
> > {"_id": "transaction-key-1", ...}
> > {"_id": "transaction-key-2", ...}
> > {"_id": "transaction-key-3", ...}
> >
> > OR
> >
> > {"_id": "bank-account-1", ...}
> > {"_id": "transaction-key-1", ..., "bank-account": "bank-account-1"}
> > {"_id": "transaction-key-2", ..., "bank-account": "bank-account-1"}
> > {"_id": "transaction-key-3", ..., "bank-account": "bank-account-1"}
> >
> > Both are a possibility. However, with the first approach, the document
> may
> > get very large. This leads into my next question:
> >
> > (2) W.r.t. denormalizing data - sometimes it naturally makes sense, e.g.
> > invoice with multiple line items stored in an array vs invoice with
> > multiple line item documents.
> >
> > With an invoice, once you make the invoice it doesn't usually change that
> > much and you also want the data together when you access it, e.g. to
> > display, however with a bank account the transactions are changing
> > frequently (e.g. new transactions every day).
> >
> > I was wondering if I should consider batching together transactions,
> e.g. a
> > document for each month's worth of transactions, or perhaps one per year,
> > etc. That way when modifying transactions, you don't need to specify the
> > entire huge document with 1000s of individual items.
> >
> > I would appreciate any feedback and suggestions/advice.
> >
> > Thanks
> > Samuel
> >
>
>
>
> --
>
> Bernhard Gschwantner
> Unser Wein G&U OG
> Kirchengasse 13/7, 1070 Wien
>
> mobil: +43 (6991) 971 32 96
> tel: +43 (1) 971 32 95
> e-mail: bernhard@unserwein.at
> twitter: @bernharduw <http://twitter.com/bernharduw>
> web: www.unserwein.at
>