You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mike <mi...@wolman.co.uk> on 2014/12/16 11:43:55 UTC

stream CouchDB changes into PostgreSQL

Hi Everyone,

We switch from Postgres to Couch for our main datasource about 5 years 
ago. One of the main reasons was for master/master replication our data 
suites the using a schema less model.

Ever since the switch I have always wanted to be able use sql to query 
the data as opposed to writing add hoc scripts to put csv/xls reports 
together.  With all the json/jsonb stuff happening with postgres 
recently I decided to scratched that itch last week with:

Node libary to stream CouchDB changes into PostgreSQL with a simple 
client example.
https://github.com/sysadminmike/couch-to-postgres/

It is fairly simple to get going and start running SELECT queries on the 
docs in your couchdb. It is also possible to use your couch views as 
tables.  It provides something similar Elastic Search river plugin but 
for Postgres.

By adding a few some extra bits allows not only for SELECT queries on 
the data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs 
within Postgres.

Thanks for reading,

Mike.




Re: stream CouchDB changes into PostgreSQL

Posted by Mike <mi...@wolman.co.uk>.
Hi Sam,

Thanks for the looking - I think I looked at couch_tap a while ago but 
as our documents are quite complicated trying to fit them into a 
structured schema would require a lot of work.  With the new json 
datatypes in postgres you dont have to.

I did do a simple dump script with php:  
https://github.com/sysadminmike/couch-to-postgres-php-dump

But node was much faster and used the _changes feed as opposed to doing 
a comparison between the databases and due to is async nature it doesnt 
block - load on the machine was much lower and memory use with multiple 
streams stays constant.

I have started work on a daemon similar to the _replicator part of couch 
allowing you to add remove couchdb to watch - without having a separate 
process for each one database you want to stream from - I have fleshed a 
fair bit of it out like reconnecting if postgres/couch disappears and 
restarting streams - plus a basic api to query whats happening.

Mike.

Samuel Lown wrote:
> Mike,
>
> Interesting approach. I'll pass it along to some of my colleagues.
>
> At Cabify we took a slightly different approach as we wanted to be able to
> perform relational analysis of the data. We developed a Ruby app that
> listens to the changes feed and synchronises with structured tables using
> SQL.
>
> Here's our code:
>
> https://github.com/samlown/couch_tap
>
> We haven't had chance yet to finish the documentation, so we've never
> published it.
>
> Cheers,
> sam
>
>
>
>
> On 16 December 2014 at 11:43, Mike <mi...@wolman.co.uk> wrote:
>
>   
>> Hi Everyone,
>>
>> We switch from Postgres to Couch for our main datasource about 5 years
>> ago. One of the main reasons was for master/master replication our data
>> suites the using a schema less model.
>>
>> Ever since the switch I have always wanted to be able use sql to query the
>> data as opposed to writing add hoc scripts to put csv/xls reports
>> together.  With all the json/jsonb stuff happening with postgres recently I
>> decided to scratched that itch last week with:
>>
>> Node libary to stream CouchDB changes into PostgreSQL with a simple client
>> example.
>> https://github.com/sysadminmike/couch-to-postgres/
>>
>> It is fairly simple to get going and start running SELECT queries on the
>> docs in your couchdb. It is also possible to use your couch views as
>> tables.  It provides something similar Elastic Search river plugin but for
>> Postgres.
>>
>> By adding a few some extra bits allows not only for SELECT queries on the
>> data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs within
>> Postgres.
>>
>> Thanks for reading,
>>
>> Mike.
>>
>>
>>
>>
>>     
>
>
>   


Re: stream CouchDB changes into PostgreSQL

Posted by Samuel Lown <sa...@gmail.com>.
Mike,

Interesting approach. I'll pass it along to some of my colleagues.

At Cabify we took a slightly different approach as we wanted to be able to
perform relational analysis of the data. We developed a Ruby app that
listens to the changes feed and synchronises with structured tables using
SQL.

Here's our code:

https://github.com/samlown/couch_tap

We haven't had chance yet to finish the documentation, so we've never
published it.

Cheers,
sam




On 16 December 2014 at 11:43, Mike <mi...@wolman.co.uk> wrote:

> Hi Everyone,
>
> We switch from Postgres to Couch for our main datasource about 5 years
> ago. One of the main reasons was for master/master replication our data
> suites the using a schema less model.
>
> Ever since the switch I have always wanted to be able use sql to query the
> data as opposed to writing add hoc scripts to put csv/xls reports
> together.  With all the json/jsonb stuff happening with postgres recently I
> decided to scratched that itch last week with:
>
> Node libary to stream CouchDB changes into PostgreSQL with a simple client
> example.
> https://github.com/sysadminmike/couch-to-postgres/
>
> It is fairly simple to get going and start running SELECT queries on the
> docs in your couchdb. It is also possible to use your couch views as
> tables.  It provides something similar Elastic Search river plugin but for
> Postgres.
>
> By adding a few some extra bits allows not only for SELECT queries on the
> data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs within
> Postgres.
>
> Thanks for reading,
>
> Mike.
>
>
>
>


-- 
www.samlown.com
www.cabify.com
www.autofiscal.com

Re: couch as statsd backend

Posted by Mike <mi...@wolman.co.uk>.
Hi Alexander,

I got it up to about 28k doc inserts every 10 secs but my machine 
couldnt generate anymore than that (or statds couldnt ingest them fast 
enough not sure - may have been hitting a socket limit somewhere).

Flushing stats at Tue Feb 24 2015 14:23:25 GMT+0000 (GMT)
done: added 28005 docs
Flushing stats at Tue Feb 24 2015 14:23:35 GMT+0000 (GMT)
done: added 28005 docs
Flushing stats at Tue Feb 24 2015 14:23:45 GMT+0000 (GMT)
done: added 28005 docs
Flushing stats at Tue Feb 24 2015 14:23:56 GMT+0000 (GMT)
done: added 28006 docs

When i dropped the flush time to 3 secs:

lushing stats at Tue Feb 24 2015 14:30:36 GMT+0000 (GMT)
done: added 27693 docs
Flushing stats at Tue Feb 24 2015 14:30:38 GMT+0000 (GMT)
done: added 27770 docs
Flushing stats at Tue Feb 24 2015 14:30:39 GMT+0000 (GMT)
done: added 27737 docs

But couch didnt last too long and died after a few secs.

I didnt do any more than some quick load testing as I am not really 
planning on collecting more than a few thousand a minute.

I didnt think it took up too much space either >2million metrics 0.5G 
(after tuning on snappy and running a compact):

{ 
  "db_name":"aatest",
  "doc_count":2272446,
  "doc_del_count":0,
  "update_seq":2272446,
  "purge_seq":0,
  "compact_running":false,
  "disk_size":562339953,
  "data_size":559869189,
  "instance_start_time":"1424789845662450",
  "disk_format_version":6,
  "committed_update_seq":2272446
}

Test machine is about 5 years old - Core2 Duo@ 3.00GHz with 4GB ram - 
with lack of ram and zfs causing some slowness and running a few couch 
instances and other jails so i suspect properly setup would push >40k 
docs on ufs rather than zfs - im using normal hard disks as well so with 
ssd drives >50k or more.

Mike.



Alexander Shorin wrote:
> Hi Mike!
>
> That's nice! Would you like to add it to our wiki with few notes about?
> But more interesting question is how does it works under load? Did you
> stress it with hundreds and thousands metrics per second? Very curious
> about.
> --
> ,,,^..^,,,
>
>
> On Wed, Feb 25, 2015 at 12:17 AM, Mike <mi...@wolman.co.uk> wrote:
>   
>> Hi Everyone,
>>
>> In case its of use for anyone else - I have added a backend to statsd to
>> emit stats to couchdb:
>>
>> https://github.com/sysadminmike/couch-statsd-backend
>>
>> https://www.npmjs.com/package/couch-statsd-backend
>>
>> Mike.
>>
>>
>>     


Re: couch as statsd backend

Posted by Alexander Shorin <kx...@gmail.com>.
Hi Mike!

That's nice! Would you like to add it to our wiki with few notes about?
But more interesting question is how does it works under load? Did you
stress it with hundreds and thousands metrics per second? Very curious
about.
--
,,,^..^,,,


On Wed, Feb 25, 2015 at 12:17 AM, Mike <mi...@wolman.co.uk> wrote:
> Hi Everyone,
>
> In case its of use for anyone else - I have added a backend to statsd to
> emit stats to couchdb:
>
> https://github.com/sysadminmike/couch-statsd-backend
>
> https://www.npmjs.com/package/couch-statsd-backend
>
> Mike.
>
>

couch as statsd backend

Posted by Mike <mi...@wolman.co.uk>.
Hi Everyone,

In case its of use for anyone else - I have added a backend to statsd to 
emit stats to couchdb:

https://github.com/sysadminmike/couch-statsd-backend

https://www.npmjs.com/package/couch-statsd-backend

Mike.



couchdb _stats graphed with grafana

Posted by Mike <mi...@wolman.co.uk>.
Hi,

As mentioned in the last couch newsletter I have been working on a 
metrics collection setup with couchdb as the data back end and using 
replication to ship metrics to a central location where it can then be 
graphed with grafana.

I have added the metric collection for couch _stats doc and example sql 
+ graphs:

https://github.com/sysadminmike/yadms/tree/master/couchdb

It is similar to https://github.com/gws/munin-plugin-couchdb

I will be adding some example graphs for the metrics collected on the 
actual databases but need to collect a few days worth of data before I can.

Mike.



Re: stream CouchDB changes into PostgreSQL

Posted by Mike <mi...@wolman.co.uk>.
Thats sounds hard!

I think postgres are working on a _changes feed type thing

But with this you can use your couchdb views in joins in postgres - so 
have both world of map/reduce plus sql.

Mike.

Alexander Shorin wrote:
> Nice project! But how about make a bidirectional sync PostgreSQL with
> CouchDB by using replication protocol? (;
> --
> ,,,^..^,,,
>
>
> On Tue, Dec 16, 2014 at 1:43 PM, Mike <mi...@wolman.co.uk> wrote:
>   
>> Hi Everyone,
>>
>> We switch from Postgres to Couch for our main datasource about 5 years ago.
>> One of the main reasons was for master/master replication our data suites
>> the using a schema less model.
>>
>> Ever since the switch I have always wanted to be able use sql to query the
>> data as opposed to writing add hoc scripts to put csv/xls reports together.
>> With all the json/jsonb stuff happening with postgres recently I decided to
>> scratched that itch last week with:
>>
>> Node libary to stream CouchDB changes into PostgreSQL with a simple client
>> example.
>> https://github.com/sysadminmike/couch-to-postgres/
>>
>> It is fairly simple to get going and start running SELECT queries on the
>> docs in your couchdb. It is also possible to use your couch views as tables.
>> It provides something similar Elastic Search river plugin but for Postgres.
>>
>> By adding a few some extra bits allows not only for SELECT queries on the
>> data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs within
>> Postgres.
>>
>> Thanks for reading,
>>
>> Mike.
>>
>>
>>
>>     


Re: stream CouchDB changes into PostgreSQL

Posted by Alexander Shorin <kx...@gmail.com>.
Nice project! But how about make a bidirectional sync PostgreSQL with
CouchDB by using replication protocol? (;
--
,,,^..^,,,


On Tue, Dec 16, 2014 at 1:43 PM, Mike <mi...@wolman.co.uk> wrote:
> Hi Everyone,
>
> We switch from Postgres to Couch for our main datasource about 5 years ago.
> One of the main reasons was for master/master replication our data suites
> the using a schema less model.
>
> Ever since the switch I have always wanted to be able use sql to query the
> data as opposed to writing add hoc scripts to put csv/xls reports together.
> With all the json/jsonb stuff happening with postgres recently I decided to
> scratched that itch last week with:
>
> Node libary to stream CouchDB changes into PostgreSQL with a simple client
> example.
> https://github.com/sysadminmike/couch-to-postgres/
>
> It is fairly simple to get going and start running SELECT queries on the
> docs in your couchdb. It is also possible to use your couch views as tables.
> It provides something similar Elastic Search river plugin but for Postgres.
>
> By adding a few some extra bits allows not only for SELECT queries on the
> data but also UPDATE/INSERTS/(DELETES todo) on your couchdb docs within
> Postgres.
>
> Thanks for reading,
>
> Mike.
>
>
>