You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mike <mi...@wolman.co.uk> on 2014/12/16 11:43:55 UTC
stream CouchDB changes into PostgreSQL
Hi Everyone,
We switch from Postgres to Couch for our main datasource about 5 years
ago. One of the main reasons was for master/master replication our data
suites the using a schema less model.
Ever since the switch I have always wanted to be able use sql to query
the data as opposed to writing add hoc scripts to put csv/xls reports
together. With all the json/jsonb stuff happening with postgres
recently I decided to scratched that itch last week with:
Node libary to stream CouchDB changes into PostgreSQL with a simple
client example.
https://github.com/sysadminmike/couch-to-postgres/
It is fairly simple to get going and start running SELECT queries on the
docs in your couchdb. It is also possible to use your couch views as
tables. It provides something similar Elastic Search river plugin but
for Postgres.
By adding a few some extra bits allows not only for SELECT queries on
the data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs
within Postgres.
Thanks for reading,
Mike.
Re: stream CouchDB changes into PostgreSQL
Posted by Mike <mi...@wolman.co.uk>.
Hi Sam,
Thanks for the looking - I think I looked at couch_tap a while ago but
as our documents are quite complicated trying to fit them into a
structured schema would require a lot of work. With the new json
datatypes in postgres you dont have to.
I did do a simple dump script with php:
https://github.com/sysadminmike/couch-to-postgres-php-dump
But node was much faster and used the _changes feed as opposed to doing
a comparison between the databases and due to is async nature it doesnt
block - load on the machine was much lower and memory use with multiple
streams stays constant.
I have started work on a daemon similar to the _replicator part of couch
allowing you to add remove couchdb to watch - without having a separate
process for each one database you want to stream from - I have fleshed a
fair bit of it out like reconnecting if postgres/couch disappears and
restarting streams - plus a basic api to query whats happening.
Mike.
Samuel Lown wrote:
> Mike,
>
> Interesting approach. I'll pass it along to some of my colleagues.
>
> At Cabify we took a slightly different approach as we wanted to be able to
> perform relational analysis of the data. We developed a Ruby app that
> listens to the changes feed and synchronises with structured tables using
> SQL.
>
> Here's our code:
>
> https://github.com/samlown/couch_tap
>
> We haven't had chance yet to finish the documentation, so we've never
> published it.
>
> Cheers,
> sam
>
>
>
>
> On 16 December 2014 at 11:43, Mike <mi...@wolman.co.uk> wrote:
>
>
>> Hi Everyone,
>>
>> We switch from Postgres to Couch for our main datasource about 5 years
>> ago. One of the main reasons was for master/master replication our data
>> suites the using a schema less model.
>>
>> Ever since the switch I have always wanted to be able use sql to query the
>> data as opposed to writing add hoc scripts to put csv/xls reports
>> together. With all the json/jsonb stuff happening with postgres recently I
>> decided to scratched that itch last week with:
>>
>> Node libary to stream CouchDB changes into PostgreSQL with a simple client
>> example.
>> https://github.com/sysadminmike/couch-to-postgres/
>>
>> It is fairly simple to get going and start running SELECT queries on the
>> docs in your couchdb. It is also possible to use your couch views as
>> tables. It provides something similar Elastic Search river plugin but for
>> Postgres.
>>
>> By adding a few some extra bits allows not only for SELECT queries on the
>> data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs within
>> Postgres.
>>
>> Thanks for reading,
>>
>> Mike.
>>
>>
>>
>>
>>
>
>
>
Re: stream CouchDB changes into PostgreSQL
Posted by Samuel Lown <sa...@gmail.com>.
Mike,
Interesting approach. I'll pass it along to some of my colleagues.
At Cabify we took a slightly different approach as we wanted to be able to
perform relational analysis of the data. We developed a Ruby app that
listens to the changes feed and synchronises with structured tables using
SQL.
Here's our code:
https://github.com/samlown/couch_tap
We haven't had chance yet to finish the documentation, so we've never
published it.
Cheers,
sam
On 16 December 2014 at 11:43, Mike <mi...@wolman.co.uk> wrote:
> Hi Everyone,
>
> We switch from Postgres to Couch for our main datasource about 5 years
> ago. One of the main reasons was for master/master replication our data
> suites the using a schema less model.
>
> Ever since the switch I have always wanted to be able use sql to query the
> data as opposed to writing add hoc scripts to put csv/xls reports
> together. With all the json/jsonb stuff happening with postgres recently I
> decided to scratched that itch last week with:
>
> Node libary to stream CouchDB changes into PostgreSQL with a simple client
> example.
> https://github.com/sysadminmike/couch-to-postgres/
>
> It is fairly simple to get going and start running SELECT queries on the
> docs in your couchdb. It is also possible to use your couch views as
> tables. It provides something similar Elastic Search river plugin but for
> Postgres.
>
> By adding a few some extra bits allows not only for SELECT queries on the
> data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs within
> Postgres.
>
> Thanks for reading,
>
> Mike.
>
>
>
>
--
www.samlown.com
www.cabify.com
www.autofiscal.com
Re: couch as statsd backend
Posted by Mike <mi...@wolman.co.uk>.
Hi Alexander,
I got it up to about 28k doc inserts every 10 secs but my machine
couldnt generate anymore than that (or statds couldnt ingest them fast
enough not sure - may have been hitting a socket limit somewhere).
Flushing stats at Tue Feb 24 2015 14:23:25 GMT+0000 (GMT)
done: added 28005 docs
Flushing stats at Tue Feb 24 2015 14:23:35 GMT+0000 (GMT)
done: added 28005 docs
Flushing stats at Tue Feb 24 2015 14:23:45 GMT+0000 (GMT)
done: added 28005 docs
Flushing stats at Tue Feb 24 2015 14:23:56 GMT+0000 (GMT)
done: added 28006 docs
When i dropped the flush time to 3 secs:
lushing stats at Tue Feb 24 2015 14:30:36 GMT+0000 (GMT)
done: added 27693 docs
Flushing stats at Tue Feb 24 2015 14:30:38 GMT+0000 (GMT)
done: added 27770 docs
Flushing stats at Tue Feb 24 2015 14:30:39 GMT+0000 (GMT)
done: added 27737 docs
But couch didnt last too long and died after a few secs.
I didnt do any more than some quick load testing as I am not really
planning on collecting more than a few thousand a minute.
I didnt think it took up too much space either >2million metrics 0.5G
(after tuning on snappy and running a compact):
{
"db_name":"aatest",
"doc_count":2272446,
"doc_del_count":0,
"update_seq":2272446,
"purge_seq":0,
"compact_running":false,
"disk_size":562339953,
"data_size":559869189,
"instance_start_time":"1424789845662450",
"disk_format_version":6,
"committed_update_seq":2272446
}
Test machine is about 5 years old - Core2 Duo@ 3.00GHz with 4GB ram -
with lack of ram and zfs causing some slowness and running a few couch
instances and other jails so i suspect properly setup would push >40k
docs on ufs rather than zfs - im using normal hard disks as well so with
ssd drives >50k or more.
Mike.
Alexander Shorin wrote:
> Hi Mike!
>
> That's nice! Would you like to add it to our wiki with few notes about?
> But more interesting question is how does it works under load? Did you
> stress it with hundreds and thousands metrics per second? Very curious
> about.
> --
> ,,,^..^,,,
>
>
> On Wed, Feb 25, 2015 at 12:17 AM, Mike <mi...@wolman.co.uk> wrote:
>
>> Hi Everyone,
>>
>> In case its of use for anyone else - I have added a backend to statsd to
>> emit stats to couchdb:
>>
>> https://github.com/sysadminmike/couch-statsd-backend
>>
>> https://www.npmjs.com/package/couch-statsd-backend
>>
>> Mike.
>>
>>
>>
Re: couch as statsd backend
Posted by Alexander Shorin <kx...@gmail.com>.
Hi Mike!
That's nice! Would you like to add it to our wiki with few notes about?
But more interesting question is how does it works under load? Did you
stress it with hundreds and thousands metrics per second? Very curious
about.
--
,,,^..^,,,
On Wed, Feb 25, 2015 at 12:17 AM, Mike <mi...@wolman.co.uk> wrote:
> Hi Everyone,
>
> In case its of use for anyone else - I have added a backend to statsd to
> emit stats to couchdb:
>
> https://github.com/sysadminmike/couch-statsd-backend
>
> https://www.npmjs.com/package/couch-statsd-backend
>
> Mike.
>
>
couch as statsd backend
Posted by Mike <mi...@wolman.co.uk>.
Hi Everyone,
In case its of use for anyone else - I have added a backend to statsd to
emit stats to couchdb:
https://github.com/sysadminmike/couch-statsd-backend
https://www.npmjs.com/package/couch-statsd-backend
Mike.
couchdb _stats graphed with grafana
Posted by Mike <mi...@wolman.co.uk>.
Hi,
As mentioned in the last couch newsletter I have been working on a
metrics collection setup with couchdb as the data back end and using
replication to ship metrics to a central location where it can then be
graphed with grafana.
I have added the metric collection for couch _stats doc and example sql
+ graphs:
https://github.com/sysadminmike/yadms/tree/master/couchdb
It is similar to https://github.com/gws/munin-plugin-couchdb
I will be adding some example graphs for the metrics collected on the
actual databases but need to collect a few days worth of data before I can.
Mike.
Re: stream CouchDB changes into PostgreSQL
Posted by Mike <mi...@wolman.co.uk>.
Thats sounds hard!
I think postgres are working on a _changes feed type thing
But with this you can use your couchdb views in joins in postgres - so
have both world of map/reduce plus sql.
Mike.
Alexander Shorin wrote:
> Nice project! But how about make a bidirectional sync PostgreSQL with
> CouchDB by using replication protocol? (;
> --
> ,,,^..^,,,
>
>
> On Tue, Dec 16, 2014 at 1:43 PM, Mike <mi...@wolman.co.uk> wrote:
>
>> Hi Everyone,
>>
>> We switch from Postgres to Couch for our main datasource about 5 years ago.
>> One of the main reasons was for master/master replication our data suites
>> the using a schema less model.
>>
>> Ever since the switch I have always wanted to be able use sql to query the
>> data as opposed to writing add hoc scripts to put csv/xls reports together.
>> With all the json/jsonb stuff happening with postgres recently I decided to
>> scratched that itch last week with:
>>
>> Node libary to stream CouchDB changes into PostgreSQL with a simple client
>> example.
>> https://github.com/sysadminmike/couch-to-postgres/
>>
>> It is fairly simple to get going and start running SELECT queries on the
>> docs in your couchdb. It is also possible to use your couch views as tables.
>> It provides something similar Elastic Search river plugin but for Postgres.
>>
>> By adding a few some extra bits allows not only for SELECT queries on the
>> data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs within
>> Postgres.
>>
>> Thanks for reading,
>>
>> Mike.
>>
>>
>>
>>
Re: stream CouchDB changes into PostgreSQL
Posted by Alexander Shorin <kx...@gmail.com>.
Nice project! But how about make a bidirectional sync PostgreSQL with
CouchDB by using replication protocol? (;
--
,,,^..^,,,
On Tue, Dec 16, 2014 at 1:43 PM, Mike <mi...@wolman.co.uk> wrote:
> Hi Everyone,
>
> We switch from Postgres to Couch for our main datasource about 5 years ago.
> One of the main reasons was for master/master replication our data suites
> the using a schema less model.
>
> Ever since the switch I have always wanted to be able use sql to query the
> data as opposed to writing add hoc scripts to put csv/xls reports together.
> With all the json/jsonb stuff happening with postgres recently I decided to
> scratched that itch last week with:
>
> Node libary to stream CouchDB changes into PostgreSQL with a simple client
> example.
> https://github.com/sysadminmike/couch-to-postgres/
>
> It is fairly simple to get going and start running SELECT queries on the
> docs in your couchdb. It is also possible to use your couch views as tables.
> It provides something similar Elastic Search river plugin but for Postgres.
>
> By adding a few some extra bits allows not only for SELECT queries on the
> data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs within
> Postgres.
>
> Thanks for reading,
>
> Mike.
>
>
>