You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Simon Woodhead <si...@simwood.com> on 2010/07/31 19:45:01 UTC

Hello / Archiving

Hi folks,

First off: thanks! CouchDB is something I found out about by accident by
virtue of my g/f breaking her toe but that's another story, although I'm
really glad she did. I've been reading about it lots and have finally
deployed it in a real-world trial today.

We store gazillions of small records. We already have them in JSON as they
flow through our queuing system. Along the way we take key bits of that info
into RDMS for main use but the end result is we need to keep the rest
somewhere for easy access. We tried S3 but it was too slow, we tried
simpleDB but hit the limits in hours so we've finally bit the bullet and are
trying CouchDB. So far so good - we write with our unique id as the _id
which make retrieval superbly easy. We haven't got into views but don't need
to for this application - just lots of parallel writes and a very occasional
retrieval.

However, in a little over an hour of the trial we have a 2GB database and it
is growing quickly. This is no great surprise as there is an awful lot of
data getting pumped in here - in raw JSON it amounts to 10-15GB per day. So
my question to the list is are there any approved methods of archiving?
Sharding seems unnecessary since one node more than handles our read/write
requirements, but it is going to need a tonne of storage. We'll also be
replicating between sites so the total requirement will be doubled.
Presently it is running as a VM with storage on the SAN so any usage is
expensive.

One idea I have had is to name the database to include the date and then
databases above a certain age could be detached and compressed somewhere.
Does that sound workable and is there an approved method for detaching? It
looks like we could just move the file without any adverse consequences but
I wanted to check! And re-attaching?

So far, CouchDB is looking like a dream come true and I'm very sure we're
going to move other applications to it as we find our way around. I have no
doubt we're going to be a relatively large deployment when we find our feet
with it so thanks again to all involved.

cheers,
Simon





-- 
Simon Woodhead FCSI
Managing Director
<http://www.simwood.com>
Simwood eSMS Limited
Wholesale Telecommunications

Keep up with the latest news from Simwood:
<http://feeds.simwood.com/SimwoodNews>
<http://www.facebook.com/pages/Simwood-eSMS-Limited/146897445321268>
  <http://twitter.com/simwoodesms>
<http://twitter.com/simwoodesms>
w: http://www.simwood.com

--
***** Email confidentiality notice *****

This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.


Simwood eSMS Limited is a limited company registered in England and Wales. Registered number: 03379831. Registered office: c/o HW Chartered Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.


Re: Hello / Archiving

Posted by Yue Chuan Lim <sh...@gmail.com>.
Ditto, but for some reason, compacting the database (and views (and cleaning
up the views)) does slim my database down quite a bit. Havn't looked into
the reason why that happens though.

Would appreciate enlightenment on this point. :)

Cheers

On Sun, Aug 1, 2010 at 1:51 AM, Simon Woodhead
<si...@simwood.com>wrote:

> Hi Yue,
>
> Blimey, that was quick :-)
>
> Our data is single revision and no deletes so as I understand it compacting
> wouldn't do much as there's nothing to clear out. We have tried though and
> sure enough, it didn't! We'll try it when it has been running for longer.
>
> cheers,
> Simon
>
>
> On Sat, Jul 31, 2010 at 6:48 PM, Yue Chuan Lim <sh...@gmail.com>
> wrote:
>
> > Have you tried compacting the database? (:
> >
> > My current use case is similar, I pretty much need to run compaction once
> a
> > day (from my estimates).
> >
> > Cheers
> >
> > On Sun, Aug 1, 2010 at 1:45 AM, Simon Woodhead
> > <si...@simwood.com>wrote:
> >
> > > Hi folks,
> > >
> > > First off: thanks! CouchDB is something I found out about by accident
> by
> > > virtue of my g/f breaking her toe but that's another story, although
> I'm
> > > really glad she did. I've been reading about it lots and have finally
> > > deployed it in a real-world trial today.
> > >
> > > We store gazillions of small records. We already have them in JSON as
> > they
> > > flow through our queuing system. Along the way we take key bits of that
> > > info
> > > into RDMS for main use but the end result is we need to keep the rest
> > > somewhere for easy access. We tried S3 but it was too slow, we tried
> > > simpleDB but hit the limits in hours so we've finally bit the bullet
> and
> > > are
> > > trying CouchDB. So far so good - we write with our unique id as the _id
> > > which make retrieval superbly easy. We haven't got into views but don't
> > > need
> > > to for this application - just lots of parallel writes and a very
> > > occasional
> > > retrieval.
> > >
> > > However, in a little over an hour of the trial we have a 2GB database
> and
> > > it
> > > is growing quickly. This is no great surprise as there is an awful lot
> of
> > > data getting pumped in here - in raw JSON it amounts to 10-15GB per
> day.
> > So
> > > my question to the list is are there any approved methods of archiving?
> > > Sharding seems unnecessary since one node more than handles our
> > read/write
> > > requirements, but it is going to need a tonne of storage. We'll also be
> > > replicating between sites so the total requirement will be doubled.
> > > Presently it is running as a VM with storage on the SAN so any usage is
> > > expensive.
> > >
> > > One idea I have had is to name the database to include the date and
> then
> > > databases above a certain age could be detached and compressed
> somewhere.
> > > Does that sound workable and is there an approved method for detaching?
> > It
> > > looks like we could just move the file without any adverse consequences
> > but
> > > I wanted to check! And re-attaching?
> > >
> > > So far, CouchDB is looking like a dream come true and I'm very sure
> we're
> > > going to move other applications to it as we find our way around. I
> have
> > no
> > > doubt we're going to be a relatively large deployment when we find our
> > feet
> > > with it so thanks again to all involved.
> > >
> > > cheers,
> > > Simon
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Simon Woodhead FCSI
> > > Managing Director
> > > <http://www.simwood.com>
> > > Simwood eSMS Limited
> > > Wholesale Telecommunications
> > >
> > > Keep up with the latest news from Simwood:
> > > <http://feeds.simwood.com/SimwoodNews>
> > > <http://www.facebook.com/pages/Simwood-eSMS-Limited/146897445321268>
> > >  <http://twitter.com/simwoodesms>
> > > <http://twitter.com/simwoodesms>
> > > w: http://www.simwood.com
> > >
> > > --
> > > ***** Email confidentiality notice *****
> > >
> > > This message is private and confidential. If you have received this
> > message
> > > in error, please notify us and remove it from your system.
> > >
> > >
> > > Simwood eSMS Limited is a limited company registered in England and
> > Wales.
> > > Registered number: 03379831. Registered office: c/o HW Chartered
> > > Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading
> > > address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
> > >
> > >
> >
>
> --
> ***** Email confidentiality notice *****
>
> This message is private and confidential. If you have received this message
> in error, please notify us and remove it from your system.
>
>
> Simwood eSMS Limited is a limited company registered in England and Wales.
> Registered number: 03379831. Registered office: c/o HW Chartered
> Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading
> address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
>
>

Re: Hello / Archiving

Posted by Mikeal Rogers <mi...@gmail.com>.
Compacting still helps a bit on the file size even if you don't update or
delete documents it's just not as drastic a change.

-Mikeal

On Sat, Jul 31, 2010 at 5:51 PM, Simon Woodhead
<si...@simwood.com>wrote:

> Hi Yue,
>
> Blimey, that was quick :-)
>
> Our data is single revision and no deletes so as I understand it compacting
> wouldn't do much as there's nothing to clear out. We have tried though and
> sure enough, it didn't! We'll try it when it has been running for longer.
>
> cheers,
> Simon
>
>
> On Sat, Jul 31, 2010 at 6:48 PM, Yue Chuan Lim <sh...@gmail.com>
> wrote:
>
> > Have you tried compacting the database? (:
> >
> > My current use case is similar, I pretty much need to run compaction once
> a
> > day (from my estimates).
> >
> > Cheers
> >
> > On Sun, Aug 1, 2010 at 1:45 AM, Simon Woodhead
> > <si...@simwood.com>wrote:
> >
> > > Hi folks,
> > >
> > > First off: thanks! CouchDB is something I found out about by accident
> by
> > > virtue of my g/f breaking her toe but that's another story, although
> I'm
> > > really glad she did. I've been reading about it lots and have finally
> > > deployed it in a real-world trial today.
> > >
> > > We store gazillions of small records. We already have them in JSON as
> > they
> > > flow through our queuing system. Along the way we take key bits of that
> > > info
> > > into RDMS for main use but the end result is we need to keep the rest
> > > somewhere for easy access. We tried S3 but it was too slow, we tried
> > > simpleDB but hit the limits in hours so we've finally bit the bullet
> and
> > > are
> > > trying CouchDB. So far so good - we write with our unique id as the _id
> > > which make retrieval superbly easy. We haven't got into views but don't
> > > need
> > > to for this application - just lots of parallel writes and a very
> > > occasional
> > > retrieval.
> > >
> > > However, in a little over an hour of the trial we have a 2GB database
> and
> > > it
> > > is growing quickly. This is no great surprise as there is an awful lot
> of
> > > data getting pumped in here - in raw JSON it amounts to 10-15GB per
> day.
> > So
> > > my question to the list is are there any approved methods of archiving?
> > > Sharding seems unnecessary since one node more than handles our
> > read/write
> > > requirements, but it is going to need a tonne of storage. We'll also be
> > > replicating between sites so the total requirement will be doubled.
> > > Presently it is running as a VM with storage on the SAN so any usage is
> > > expensive.
> > >
> > > One idea I have had is to name the database to include the date and
> then
> > > databases above a certain age could be detached and compressed
> somewhere.
> > > Does that sound workable and is there an approved method for detaching?
> > It
> > > looks like we could just move the file without any adverse consequences
> > but
> > > I wanted to check! And re-attaching?
> > >
> > > So far, CouchDB is looking like a dream come true and I'm very sure
> we're
> > > going to move other applications to it as we find our way around. I
> have
> > no
> > > doubt we're going to be a relatively large deployment when we find our
> > feet
> > > with it so thanks again to all involved.
> > >
> > > cheers,
> > > Simon
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Simon Woodhead FCSI
> > > Managing Director
> > > <http://www.simwood.com>
> > > Simwood eSMS Limited
> > > Wholesale Telecommunications
> > >
> > > Keep up with the latest news from Simwood:
> > > <http://feeds.simwood.com/SimwoodNews>
> > > <http://www.facebook.com/pages/Simwood-eSMS-Limited/146897445321268>
> > >  <http://twitter.com/simwoodesms>
> > > <http://twitter.com/simwoodesms>
> > > w: http://www.simwood.com
> > >
> > > --
> > > ***** Email confidentiality notice *****
> > >
> > > This message is private and confidential. If you have received this
> > message
> > > in error, please notify us and remove it from your system.
> > >
> > >
> > > Simwood eSMS Limited is a limited company registered in England and
> > Wales.
> > > Registered number: 03379831. Registered office: c/o HW Chartered
> > > Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading
> > > address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
> > >
> > >
> >
>
> --
> ***** Email confidentiality notice *****
>
> This message is private and confidential. If you have received this message
> in error, please notify us and remove it from your system.
>
>
> Simwood eSMS Limited is a limited company registered in England and Wales.
> Registered number: 03379831. Registered office: c/o HW Chartered
> Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading
> address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
>
>

Re: Hello / Archiving

Posted by Simon Woodhead <si...@simwood.com>.
Hi Yue,

Blimey, that was quick :-)

Our data is single revision and no deletes so as I understand it compacting
wouldn't do much as there's nothing to clear out. We have tried though and
sure enough, it didn't! We'll try it when it has been running for longer.

cheers,
Simon


On Sat, Jul 31, 2010 at 6:48 PM, Yue Chuan Lim <sh...@gmail.com> wrote:

> Have you tried compacting the database? (:
>
> My current use case is similar, I pretty much need to run compaction once a
> day (from my estimates).
>
> Cheers
>
> On Sun, Aug 1, 2010 at 1:45 AM, Simon Woodhead
> <si...@simwood.com>wrote:
>
> > Hi folks,
> >
> > First off: thanks! CouchDB is something I found out about by accident by
> > virtue of my g/f breaking her toe but that's another story, although I'm
> > really glad she did. I've been reading about it lots and have finally
> > deployed it in a real-world trial today.
> >
> > We store gazillions of small records. We already have them in JSON as
> they
> > flow through our queuing system. Along the way we take key bits of that
> > info
> > into RDMS for main use but the end result is we need to keep the rest
> > somewhere for easy access. We tried S3 but it was too slow, we tried
> > simpleDB but hit the limits in hours so we've finally bit the bullet and
> > are
> > trying CouchDB. So far so good - we write with our unique id as the _id
> > which make retrieval superbly easy. We haven't got into views but don't
> > need
> > to for this application - just lots of parallel writes and a very
> > occasional
> > retrieval.
> >
> > However, in a little over an hour of the trial we have a 2GB database and
> > it
> > is growing quickly. This is no great surprise as there is an awful lot of
> > data getting pumped in here - in raw JSON it amounts to 10-15GB per day.
> So
> > my question to the list is are there any approved methods of archiving?
> > Sharding seems unnecessary since one node more than handles our
> read/write
> > requirements, but it is going to need a tonne of storage. We'll also be
> > replicating between sites so the total requirement will be doubled.
> > Presently it is running as a VM with storage on the SAN so any usage is
> > expensive.
> >
> > One idea I have had is to name the database to include the date and then
> > databases above a certain age could be detached and compressed somewhere.
> > Does that sound workable and is there an approved method for detaching?
> It
> > looks like we could just move the file without any adverse consequences
> but
> > I wanted to check! And re-attaching?
> >
> > So far, CouchDB is looking like a dream come true and I'm very sure we're
> > going to move other applications to it as we find our way around. I have
> no
> > doubt we're going to be a relatively large deployment when we find our
> feet
> > with it so thanks again to all involved.
> >
> > cheers,
> > Simon
> >
> >
> >
> >
> >
> > --
> > Simon Woodhead FCSI
> > Managing Director
> > <http://www.simwood.com>
> > Simwood eSMS Limited
> > Wholesale Telecommunications
> >
> > Keep up with the latest news from Simwood:
> > <http://feeds.simwood.com/SimwoodNews>
> > <http://www.facebook.com/pages/Simwood-eSMS-Limited/146897445321268>
> >  <http://twitter.com/simwoodesms>
> > <http://twitter.com/simwoodesms>
> > w: http://www.simwood.com
> >
> > --
> > ***** Email confidentiality notice *****
> >
> > This message is private and confidential. If you have received this
> message
> > in error, please notify us and remove it from your system.
> >
> >
> > Simwood eSMS Limited is a limited company registered in England and
> Wales.
> > Registered number: 03379831. Registered office: c/o HW Chartered
> > Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading
> > address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
> >
> >
>

--
***** Email confidentiality notice *****

This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.


Simwood eSMS Limited is a limited company registered in England and Wales. Registered number: 03379831. Registered office: c/o HW Chartered Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.


Re: Hello / Archiving

Posted by Yue Chuan Lim <sh...@gmail.com>.
Have you tried compacting the database? (:

My current use case is similar, I pretty much need to run compaction once a
day (from my estimates).

Cheers

On Sun, Aug 1, 2010 at 1:45 AM, Simon Woodhead
<si...@simwood.com>wrote:

> Hi folks,
>
> First off: thanks! CouchDB is something I found out about by accident by
> virtue of my g/f breaking her toe but that's another story, although I'm
> really glad she did. I've been reading about it lots and have finally
> deployed it in a real-world trial today.
>
> We store gazillions of small records. We already have them in JSON as they
> flow through our queuing system. Along the way we take key bits of that
> info
> into RDMS for main use but the end result is we need to keep the rest
> somewhere for easy access. We tried S3 but it was too slow, we tried
> simpleDB but hit the limits in hours so we've finally bit the bullet and
> are
> trying CouchDB. So far so good - we write with our unique id as the _id
> which make retrieval superbly easy. We haven't got into views but don't
> need
> to for this application - just lots of parallel writes and a very
> occasional
> retrieval.
>
> However, in a little over an hour of the trial we have a 2GB database and
> it
> is growing quickly. This is no great surprise as there is an awful lot of
> data getting pumped in here - in raw JSON it amounts to 10-15GB per day. So
> my question to the list is are there any approved methods of archiving?
> Sharding seems unnecessary since one node more than handles our read/write
> requirements, but it is going to need a tonne of storage. We'll also be
> replicating between sites so the total requirement will be doubled.
> Presently it is running as a VM with storage on the SAN so any usage is
> expensive.
>
> One idea I have had is to name the database to include the date and then
> databases above a certain age could be detached and compressed somewhere.
> Does that sound workable and is there an approved method for detaching? It
> looks like we could just move the file without any adverse consequences but
> I wanted to check! And re-attaching?
>
> So far, CouchDB is looking like a dream come true and I'm very sure we're
> going to move other applications to it as we find our way around. I have no
> doubt we're going to be a relatively large deployment when we find our feet
> with it so thanks again to all involved.
>
> cheers,
> Simon
>
>
>
>
>
> --
> Simon Woodhead FCSI
> Managing Director
> <http://www.simwood.com>
> Simwood eSMS Limited
> Wholesale Telecommunications
>
> Keep up with the latest news from Simwood:
> <http://feeds.simwood.com/SimwoodNews>
> <http://www.facebook.com/pages/Simwood-eSMS-Limited/146897445321268>
>  <http://twitter.com/simwoodesms>
> <http://twitter.com/simwoodesms>
> w: http://www.simwood.com
>
> --
> ***** Email confidentiality notice *****
>
> This message is private and confidential. If you have received this message
> in error, please notify us and remove it from your system.
>
>
> Simwood eSMS Limited is a limited company registered in England and Wales.
> Registered number: 03379831. Registered office: c/o HW Chartered
> Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading
> address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
>
>

Re: Hello / Archiving

Posted by Simon Woodhead <si...@simwood.com>.
Awesome! Code changes to include the date in the db name and replication
started between the old and new database. Time to relax :-)

Thanks for your help guys.

cheers,
Simon

On Sat, Jul 31, 2010 at 6:54 PM, Mikeal Rogers <mi...@gmail.com>wrote:

> Yeah, all of the CouchDB db and view files are append-only so they are
> *always* in a consistent state and can be started up at any time.
>
> Copying and moving the db files somewhere will be fine and all you need to
> do to start them back up is move them back :) CouchDB kinda rocks like
> this.
>
> -Mikeal
>
> On Sat, Jul 31, 2010 at 5:48 PM, J Chris Anderson <jc...@gmail.com>
> wrote:
>
> >
> > On Jul 31, 2010, at 10:45 AM, Simon Woodhead wrote:
> >
> > > Hi folks,
> > >
> > > First off: thanks! CouchDB is something I found out about by accident
> by
> > > virtue of my g/f breaking her toe but that's another story, although
> I'm
> > > really glad she did. I've been reading about it lots and have finally
> > > deployed it in a real-world trial today.
> > >
> > > We store gazillions of small records. We already have them in JSON as
> > they
> > > flow through our queuing system. Along the way we take key bits of that
> > info
> > > into RDMS for main use but the end result is we need to keep the rest
> > > somewhere for easy access. We tried S3 but it was too slow, we tried
> > > simpleDB but hit the limits in hours so we've finally bit the bullet
> and
> > are
> > > trying CouchDB. So far so good - we write with our unique id as the _id
> > > which make retrieval superbly easy. We haven't got into views but don't
> > need
> > > to for this application - just lots of parallel writes and a very
> > occasional
> > > retrieval.
> > >
> > > However, in a little over an hour of the trial we have a 2GB database
> and
> > it
> > > is growing quickly. This is no great surprise as there is an awful lot
> of
> > > data getting pumped in here - in raw JSON it amounts to 10-15GB per
> day.
> > So
> > > my question to the list is are there any approved methods of archiving?
> > > Sharding seems unnecessary since one node more than handles our
> > read/write
> > > requirements, but it is going to need a tonne of storage. We'll also be
> > > replicating between sites so the total requirement will be doubled.
> > > Presently it is running as a VM with storage on the SAN so any usage is
> > > expensive.
> > >
> > > One idea I have had is to name the database to include the date and
> then
> > > databases above a certain age could be detached and compressed
> somewhere.
> > > Does that sound workable and is there an approved method for detaching?
> > It
> > > looks like we could just move the file without any adverse consequences
> > but
> > > I wanted to check! And re-attaching?
> > >
> >
> > this is what I would suggest.
> >
> > the 1.x line will be binary compatible, so you should have no trouble
> > re-activating stored databases.
> >
> > > So far, CouchDB is looking like a dream come true and I'm very sure
> we're
> > > going to move other applications to it as we find our way around. I
> have
> > no
> > > doubt we're going to be a relatively large deployment when we find our
> > feet
> > > with it so thanks again to all involved.
> > >
> >
> > if you hit walls, please ask here before getting frustrated. there is a
> lot
> > of experience with very large scale CouchDB, so folks will be able to
> help.
> >
> > Chris
> >
> > > cheers,
> > > Simon
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Simon Woodhead FCSI
> > > Managing Director
> > > <http://www.simwood.com>
> > > Simwood eSMS Limited
> > > Wholesale Telecommunications
> > >
> > > Keep up with the latest news from Simwood:
> > > <http://feeds.simwood.com/SimwoodNews>
> > > <http://www.facebook.com/pages/Simwood-eSMS-Limited/146897445321268>
> > >  <http://twitter.com/simwoodesms>
> > > <http://twitter.com/simwoodesms>
> > > w: http://www.simwood.com
> > >
> > > --
> > > ***** Email confidentiality notice *****
> > >
> > > This message is private and confidential. If you have received this
> > message in error, please notify us and remove it from your system.
> > >
> > >
> > > Simwood eSMS Limited is a limited company registered in England and
> > Wales. Registered number: 03379831. Registered office: c/o HW Chartered
> > Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading
> > address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
> > >
> >
> >
>

--
***** Email confidentiality notice *****

This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.


Simwood eSMS Limited is a limited company registered in England and Wales. Registered number: 03379831. Registered office: c/o HW Chartered Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.


Re: Hello / Archiving

Posted by Mikeal Rogers <mi...@gmail.com>.
Yeah, all of the CouchDB db and view files are append-only so they are
*always* in a consistent state and can be started up at any time.

Copying and moving the db files somewhere will be fine and all you need to
do to start them back up is move them back :) CouchDB kinda rocks like this.

-Mikeal

On Sat, Jul 31, 2010 at 5:48 PM, J Chris Anderson <jc...@gmail.com> wrote:

>
> On Jul 31, 2010, at 10:45 AM, Simon Woodhead wrote:
>
> > Hi folks,
> >
> > First off: thanks! CouchDB is something I found out about by accident by
> > virtue of my g/f breaking her toe but that's another story, although I'm
> > really glad she did. I've been reading about it lots and have finally
> > deployed it in a real-world trial today.
> >
> > We store gazillions of small records. We already have them in JSON as
> they
> > flow through our queuing system. Along the way we take key bits of that
> info
> > into RDMS for main use but the end result is we need to keep the rest
> > somewhere for easy access. We tried S3 but it was too slow, we tried
> > simpleDB but hit the limits in hours so we've finally bit the bullet and
> are
> > trying CouchDB. So far so good - we write with our unique id as the _id
> > which make retrieval superbly easy. We haven't got into views but don't
> need
> > to for this application - just lots of parallel writes and a very
> occasional
> > retrieval.
> >
> > However, in a little over an hour of the trial we have a 2GB database and
> it
> > is growing quickly. This is no great surprise as there is an awful lot of
> > data getting pumped in here - in raw JSON it amounts to 10-15GB per day.
> So
> > my question to the list is are there any approved methods of archiving?
> > Sharding seems unnecessary since one node more than handles our
> read/write
> > requirements, but it is going to need a tonne of storage. We'll also be
> > replicating between sites so the total requirement will be doubled.
> > Presently it is running as a VM with storage on the SAN so any usage is
> > expensive.
> >
> > One idea I have had is to name the database to include the date and then
> > databases above a certain age could be detached and compressed somewhere.
> > Does that sound workable and is there an approved method for detaching?
> It
> > looks like we could just move the file without any adverse consequences
> but
> > I wanted to check! And re-attaching?
> >
>
> this is what I would suggest.
>
> the 1.x line will be binary compatible, so you should have no trouble
> re-activating stored databases.
>
> > So far, CouchDB is looking like a dream come true and I'm very sure we're
> > going to move other applications to it as we find our way around. I have
> no
> > doubt we're going to be a relatively large deployment when we find our
> feet
> > with it so thanks again to all involved.
> >
>
> if you hit walls, please ask here before getting frustrated. there is a lot
> of experience with very large scale CouchDB, so folks will be able to help.
>
> Chris
>
> > cheers,
> > Simon
> >
> >
> >
> >
> >
> > --
> > Simon Woodhead FCSI
> > Managing Director
> > <http://www.simwood.com>
> > Simwood eSMS Limited
> > Wholesale Telecommunications
> >
> > Keep up with the latest news from Simwood:
> > <http://feeds.simwood.com/SimwoodNews>
> > <http://www.facebook.com/pages/Simwood-eSMS-Limited/146897445321268>
> >  <http://twitter.com/simwoodesms>
> > <http://twitter.com/simwoodesms>
> > w: http://www.simwood.com
> >
> > --
> > ***** Email confidentiality notice *****
> >
> > This message is private and confidential. If you have received this
> message in error, please notify us and remove it from your system.
> >
> >
> > Simwood eSMS Limited is a limited company registered in England and
> Wales. Registered number: 03379831. Registered office: c/o HW Chartered
> Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading
> address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
> >
>
>

Re: Hello / Archiving

Posted by J Chris Anderson <jc...@gmail.com>.
On Jul 31, 2010, at 10:45 AM, Simon Woodhead wrote:

> Hi folks,
> 
> First off: thanks! CouchDB is something I found out about by accident by
> virtue of my g/f breaking her toe but that's another story, although I'm
> really glad she did. I've been reading about it lots and have finally
> deployed it in a real-world trial today.
> 
> We store gazillions of small records. We already have them in JSON as they
> flow through our queuing system. Along the way we take key bits of that info
> into RDMS for main use but the end result is we need to keep the rest
> somewhere for easy access. We tried S3 but it was too slow, we tried
> simpleDB but hit the limits in hours so we've finally bit the bullet and are
> trying CouchDB. So far so good - we write with our unique id as the _id
> which make retrieval superbly easy. We haven't got into views but don't need
> to for this application - just lots of parallel writes and a very occasional
> retrieval.
> 
> However, in a little over an hour of the trial we have a 2GB database and it
> is growing quickly. This is no great surprise as there is an awful lot of
> data getting pumped in here - in raw JSON it amounts to 10-15GB per day. So
> my question to the list is are there any approved methods of archiving?
> Sharding seems unnecessary since one node more than handles our read/write
> requirements, but it is going to need a tonne of storage. We'll also be
> replicating between sites so the total requirement will be doubled.
> Presently it is running as a VM with storage on the SAN so any usage is
> expensive.
> 
> One idea I have had is to name the database to include the date and then
> databases above a certain age could be detached and compressed somewhere.
> Does that sound workable and is there an approved method for detaching? It
> looks like we could just move the file without any adverse consequences but
> I wanted to check! And re-attaching?
> 

this is what I would suggest.

the 1.x line will be binary compatible, so you should have no trouble re-activating stored databases.

> So far, CouchDB is looking like a dream come true and I'm very sure we're
> going to move other applications to it as we find our way around. I have no
> doubt we're going to be a relatively large deployment when we find our feet
> with it so thanks again to all involved.
> 

if you hit walls, please ask here before getting frustrated. there is a lot of experience with very large scale CouchDB, so folks will be able to help.

Chris

> cheers,
> Simon
> 
> 
> 
> 
> 
> -- 
> Simon Woodhead FCSI
> Managing Director
> <http://www.simwood.com>
> Simwood eSMS Limited
> Wholesale Telecommunications
> 
> Keep up with the latest news from Simwood:
> <http://feeds.simwood.com/SimwoodNews>
> <http://www.facebook.com/pages/Simwood-eSMS-Limited/146897445321268>
>  <http://twitter.com/simwoodesms>
> <http://twitter.com/simwoodesms>
> w: http://www.simwood.com
> 
> --
> ***** Email confidentiality notice *****
> 
> This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.
> 
> 
> Simwood eSMS Limited is a limited company registered in England and Wales. Registered number: 03379831. Registered office: c/o HW Chartered Accountants, Keepers Lane, The Wergs, Wolverhampton, WV6 8UA. Trading address: Falcon Drive, Cardiff Bay, Cardiff, CF10 4RU.
>