You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Simon Woodhead <si...@simwood.com> on 2010/08/13 16:55:59 UTC

Map/reduce and transactions

Hi folks

As we explore more ways to migrate to CouchDB we're exploring alternatives to transactions. The books shows a bank where transfers are stored as a single document with the balance being the result of a map/reduce function. That makes sense. 

For us, our scale is tipped differently in that we have hundreds of millions of tiny transactions affecting relatively few balances. In MySQL (ironically) we denormalise this and hold balances in their own table but then can insert transactions and update balances within a single transaction. 

Looking at moving this to CouchDB would mean getting rid of the balances table and just using a map/reduce function. I recognise that a given document will only be handled once and that this is therefore more efficient than it may seem to a SQL jock like me but I wanted to ask about whether it truly scales at volume. 

I guess I'm asking whether whatever index the view creates contains a reference to every document (and thus gets bigger with more documents) or just contains the output and the _id of the last document processed. I can see the first one running into issues quickly whilst the second would seem to scale indefinitely. 

FWIW it takes over a day to compute the sum against the table in MySQL as the table is being constantly appended - hence why we denormalised it! It therefore feels really strange to normalise more when moving to a document store so any advice is welcome! 

Thanks
Simon--
***** Email confidentiality notice *****

This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.


Simwood eSMS Limited is a limited company registered in England and Wales. Registered number: 03379831. Registered office: c/o HW Chartered Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.



Re: Map/reduce and transactions

Posted by Noah Slater <ns...@apache.org>.
On 13 Aug 2010, at 18:26, Tyler Gillies wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 8/13/2010 10:05 AM, Simon Woodhead wrote:
>> ***** Email confidentiality notice *****
>> 
>> This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.
> 
> this is a public mailing list...

Please send comments like this to the poster in private. Thank you.

Re: Map/reduce and transactions

Posted by Tyler Gillies <tj...@gmail.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/13/2010 10:05 AM, Simon Woodhead wrote:
> ***** Email confidentiality notice *****
> 
> This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.

this is a public mailing list...

- -- 
http://pdxbrain.com/key.txt
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEVAwUBTGWASyw+aIHzFaKyAQKcfwf/fy9F2ljJh41q7VZfqYouW3fD9ToQgRfc
VcRYOvUoiRz6cl/fYw2StUy2yDmYqyLH3YiWnRAhSgwbxt4VNkyzYf/nYnyG5iNk
qYMEcfnZwV5IDditt9JoMDCMP1+or8uX96aBK5LhMHdE6RdYOo6rY2cYgDSHuRGX
zHQf2t3TFCMIefvMo6+t0GW3lbOeojAMpRFipaCbLM61kjkskfawPosA9UHAfluF
uiVGinurmjDI5DV2COY5MtmmokwbMRWT468849gKHJA56VE0PjGtRvxbqS/hPcQo
Wb+FS6/l0C+7nN5hJirZgTOACYH1umaUkAgVJ4IKnDfcOxBcGXtMIw==
=FYt0
-----END PGP SIGNATURE-----


Re: Map/reduce and transactions

Posted by Simon Woodhead <si...@simwood.com>.
>
> You could keep the summing in Couch as long as you have a balance-forward
> transaction to start the new month with some data, as Chris suggested.
>  Also, keep in mind that Couch can do this sum more efficiently than SQL
> because the reduce function to sum the view will store intermediate values
> in the index. Incremental updates are relatively cheap and you never have
> to
> compute the entire sum except the first time you insert your data.
>

Yeah that makes perfect sense, thanks.

Right now it is so laborious in MySQL as the index is invalidated with
constant reinserts so there is massive value in only computing the balance
incrementally. The same is true of other views on this data.

Thanks for your help all,
Simon

--
***** Email confidentiality notice *****

This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.


Simwood eSMS Limited is a limited company registered in England and Wales. Registered number: 03379831. Registered office: c/o HW Chartered Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.


Re: Map/reduce and transactions

Posted by Randall Leeds <ra...@gmail.com>.
You could keep the summing in Couch as long as you have a balance-forward
transaction to start the new month with some data, as Chris suggested.

Also, keep in mind that Couch can do this sum more efficiently than SQL
because the reduce function to sum the view will store intermediate values
in the index. Incremental updates are relatively cheap and you never have to
compute the entire sum except the first time you insert your data.

Sent from my interstellar unicorn.

On Aug 13, 2010 10:06 AM, "Simon Woodhead" <si...@simwood.com>
wrote:

Hi,

Thanks.


The index in the banking example in the book will indeed grow as
> transactions happen. If you have...
We're already rolling up by month but would I be right in understanding your
suggestion to be that the rollups should be in a separate database? We were
thinking of another doc type in the same database to contain an opening and
closing balance. This enables us to roll-up but not throw away. I guess the
summing would need to be partly in app login if they were in two separate
databases?

cheers,

Simon

--
***** Email confidentiality notice *****

This message is private and confidential. If you...

Re: Map/reduce and transactions

Posted by Simon Woodhead <si...@simwood.com>.
Hi,

Thanks.

The index in the banking example in the book will indeed grow as
> transactions happen. If you have a very high rate of transactions, you might
> want to do like banks do, and "close" each month, by moving new transactions
> to a new database file at the end of the month. Then you can do a
> transaction for each account to carry forward the closing balance at the end
> of the last month, to the current month.
>
After X months, you can throw away the old transaction databases, probably
> storing the monthly rollups somewhere for posterity.
>

We're already rolling up by month but would I be right in understanding your
suggestion to be that the rollups should be in a separate database? We were
thinking of another doc type in the same database to contain an opening and
closing balance. This enables us to roll-up but not throw away. I guess the
summing would need to be partly in app login if they were in two separate
databases?

cheers,
Simon

--
***** Email confidentiality notice *****

This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.


Simwood eSMS Limited is a limited company registered in England and Wales. Registered number: 03379831. Registered office: c/o HW Chartered Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.


Re: Map/reduce and transactions

Posted by J Chris Anderson <jc...@apache.org>.
On Aug 13, 2010, at 7:55 AM, Simon Woodhead wrote:

> Hi folks
> 
> As we explore more ways to migrate to CouchDB we're exploring alternatives to transactions. The books shows a bank where transfers are stored as a single document with the balance being the result of a map/reduce function. That makes sense. 
> 
> For us, our scale is tipped differently in that we have hundreds of millions of tiny transactions affecting relatively few balances. In MySQL (ironically) we denormalise this and hold balances in their own table but then can insert transactions and update balances within a single transaction. 
> 
> Looking at moving this to CouchDB would mean getting rid of the balances table and just using a map/reduce function. I recognise that a given document will only be handled once and that this is therefore more efficient than it may seem to a SQL jock like me but I wanted to ask about whether it truly scales at volume. 
> 
> I guess I'm asking whether whatever index the view creates contains a reference to every document (and thus gets bigger with more documents) or just contains the output and the _id of the last document processed. I can see the first one running into issues quickly whilst the second would seem to scale indefinitely. 
> 
> FWIW it takes over a day to compute the sum against the table in MySQL as the table is being constantly appended - hence why we denormalised it! It therefore feels really strange to normalise more when moving to a document store so any advice is welcome! 

The index in the banking example in the book will indeed grow as transactions happen. If you have a very high rate of transactions, you might want to do like banks do, and "close" each month, by moving new transactions to a new database file at the end of the month. Then you can do a transaction for each account to carry forward the closing balance at the end of the last month, to the current month.

After X months, you can throw away the old transaction databases, probably storing the monthly rollups somewhere for posterity.

If you rate is high enough, substitute days or hours for months.

> 
> Thanks
> Simon--
> ***** Email confidentiality notice *****
> 
> This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.
> 
> 
> Simwood eSMS Limited is a limited company registered in England and Wales. Registered number: 03379831. Registered office: c/o HW Chartered Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
> 
>