You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Алекс Zatvornitskiy <a....@gmail.com> on 2013/02/13 16:04:00 UTC

I've just released a first preview of couch_normalizer

Hi everybody!

A couch_normalizer v0.6 is out!

The couch_normalizer designed as a standard Apache CouchDB httpd handler
and uses a Rails db migration approach. Written both in Erlang and Elixir.
Works well on production and has a great IO performance.

For example:

% Starts a normalization process.

% curl -v -XPOST -H"Content-Type: application/json"
http://127.0.0.1:5984/db/_normalize

% => {"ok":"Normalization process has been started (<0.174.0>)."}


% Gets a normalization process execution status.

% curl -v http://127.0.0.1:5984/_active_tasks

% =>
[{"pid":"<0.174.0>","continue":false,"db":"db","docs_conflicted":0,"docs_deleted":0,"docs_normalized":0,"docs_read":3000,"finished_on":1358513508,"num_workers":5,"started_on":1358513508,"type":"normalization","updated_on":1358513508}]

As a result, it allows to deploy migration scripts (aka scenarios) and
change big amount of documents as fast as possible (without HTTP overhead
and some kind of 'delayed jobs') via internal CouchDB functions, such as
couch_db:open_doc/2, couch_db:update_doc/3 and so on.

Check more: https://github.com/datahogs/couch_normalizer

If you want to contribute, feel free to open an Github issue or submit a
pull request or ping me offline)

It still under scoping and development.

Re: I've just released a first preview of couch_normalizer

Posted by José Valim <jo...@plataformatec.com.br>.
Congratulations with the release! :D


*José Valim*
www.plataformatec.com.br
Skype: jv.ptec
Founder and Lead Developer


On Wed, Feb 13, 2013 at 8:04 AM, Алекс Zatvornitskiy <
a.zatvornitskiy@gmail.com> wrote:

>  Hi everybody!
>
> A couch_normalizer v0.6 is out!
>
> The couch_normalizer designed as a standard Apache CouchDB httpd handler
> and uses a Rails db migration approach. Written both in Erlang and Elixir.
> Works well on production and has a great IO performance.
>
> For example:
>
> % Starts a normalization process.
>
> % curl -v -XPOST -H"Content-Type: application/json"
> http://127.0.0.1:5984/db/_normalize
>
> % => {"ok":"Normalization process has been started (<0.174.0>)."}
>
>
> % Gets a normalization process execution status.
>
> % curl -v http://127.0.0.1:5984/_active_tasks
>
> % =>
> [{"pid":"<0.174.0>","continue":false,"db":"db","docs_conflicted":0,"docs_deleted":0,"docs_normalized":0,"docs_read":3000,"finished_on":1358513508,"num_workers":5,"started_on":1358513508,"type":"normalization","updated_on":1358513508}]
>
> As a result, it allows to deploy migration scripts (aka scenarios) and
> change big amount of documents as fast as possible (without HTTP overhead
> and some kind of 'delayed jobs') via internal CouchDB functions, such as
> couch_db:open_doc/2, couch_db:update_doc/3 and so on.
>
> Check more: https://github.com/datahogs/couch_normalizer
>
> If you want to contribute, feel free to open an Github issue or submit a
> pull request or ping me offline)
>
> It still under scoping and development.
>

Re: I've just released a first preview of couch_normalizer

Posted by Алекс Zatvornitskiy <a....@gmail.com>.
Hi! I pleased that you find it useful.

In short:

1. Take a short look on {num_workers, 5} configuration options. It means
that for target db a couch_normalizer will use 5 workers, each worker works
under OTP. A particular worker spawn its own child process for applying
particular scenarios for current document. You should see log messages in
case one of them ends or fails (for many reasons such a conflict, scenario
logic, etc). Check this code http://goo.gl/TasYp for more details.

2. I think that some kind of final notification should be. Feel free to
submit an issue. Is it what you want? http://goo.gl/UThkT

3. Yeah, Alex I know. That why we used the Elixir and its amazing
functional meta-programming features.


On Wed, Feb 13, 2013 at 5:47 PM, Alexander Shorin <kx...@gmail.com> wrote:

> That is awesome, Alex!
>
> This is the thing I always dream when our document schemes receives
> new  update and there is need to apply they for large amount of
> databases. Trick with local database mirror and replications works,
> but it's not very fast.
>
> Few questions:
>
> 1. What happens it normalization fails due to some reasons? For
> example, due to conflict update. Have I start whole normalization
> again in this case or it will just write few notes about conflict in
> logs and will try apply script to document one more time?
>
> 2. Is it possible to receive final result of the normalization job and
> error description if it fails? Continuous querying _active_tasks looks
> not optimal and it's possible to miss the moment when normalization
> fails.
>
> 3. Only Erlang and Elixir migration scripts are possible, right? Or
> it's possible to use any scripts that supports CouchDB stdio
> communication protocol? I suppose many of us who faced same problem
> already have Python/Ruby/whatever-lang scripts that successfully
> handles scheme normalization and well tested. It would be helpful to
> not force them made whole job again.
>
> --
> ,,,^..^,,,
>
>
> On Wed, Feb 13, 2013 at 7:04 PM, Алекс Zatvornitskiy
> <a....@gmail.com> wrote:
> > Hi everybody!
> >
> > A couch_normalizer v0.6 is out!
> >
> > The couch_normalizer designed as a standard Apache CouchDB httpd handler
> > and uses a Rails db migration approach. Written both in Erlang and
> Elixir.
> > Works well on production and has a great IO performance.
> >
> > For example:
> >
> > % Starts a normalization process.
> >
> > % curl -v -XPOST -H"Content-Type: application/json"
> > http://127.0.0.1:5984/db/_normalize
> >
> > % => {"ok":"Normalization process has been started (<0.174.0>)."}
> >
> >
> > % Gets a normalization process execution status.
> >
> > % curl -v http://127.0.0.1:5984/_active_tasks
> >
> > % =>
> >
> [{"pid":"<0.174.0>","continue":false,"db":"db","docs_conflicted":0,"docs_deleted":0,"docs_normalized":0,"docs_read":3000,"finished_on":1358513508,"num_workers":5,"started_on":1358513508,"type":"normalization","updated_on":1358513508}]
> >
> > As a result, it allows to deploy migration scripts (aka scenarios) and
> > change big amount of documents as fast as possible (without HTTP overhead
> > and some kind of 'delayed jobs') via internal CouchDB functions, such as
> > couch_db:open_doc/2, couch_db:update_doc/3 and so on.
> >
> > Check more: https://github.com/datahogs/couch_normalizer
> >
> > If you want to contribute, feel free to open an Github issue or submit a
> > pull request or ping me offline)
> >
> > It still under scoping and development.
>

Re: I've just released a first preview of couch_normalizer

Posted by Alexander Shorin <kx...@gmail.com>.
That is awesome, Alex!

This is the thing I always dream when our document schemes receives
new  update and there is need to apply they for large amount of
databases. Trick with local database mirror and replications works,
but it's not very fast.

Few questions:

1. What happens it normalization fails due to some reasons? For
example, due to conflict update. Have I start whole normalization
again in this case or it will just write few notes about conflict in
logs and will try apply script to document one more time?

2. Is it possible to receive final result of the normalization job and
error description if it fails? Continuous querying _active_tasks looks
not optimal and it's possible to miss the moment when normalization
fails.

3. Only Erlang and Elixir migration scripts are possible, right? Or
it's possible to use any scripts that supports CouchDB stdio
communication protocol? I suppose many of us who faced same problem
already have Python/Ruby/whatever-lang scripts that successfully
handles scheme normalization and well tested. It would be helpful to
not force them made whole job again.

--
,,,^..^,,,


On Wed, Feb 13, 2013 at 7:04 PM, Алекс Zatvornitskiy
<a....@gmail.com> wrote:
> Hi everybody!
>
> A couch_normalizer v0.6 is out!
>
> The couch_normalizer designed as a standard Apache CouchDB httpd handler
> and uses a Rails db migration approach. Written both in Erlang and Elixir.
> Works well on production and has a great IO performance.
>
> For example:
>
> % Starts a normalization process.
>
> % curl -v -XPOST -H"Content-Type: application/json"
> http://127.0.0.1:5984/db/_normalize
>
> % => {"ok":"Normalization process has been started (<0.174.0>)."}
>
>
> % Gets a normalization process execution status.
>
> % curl -v http://127.0.0.1:5984/_active_tasks
>
> % =>
> [{"pid":"<0.174.0>","continue":false,"db":"db","docs_conflicted":0,"docs_deleted":0,"docs_normalized":0,"docs_read":3000,"finished_on":1358513508,"num_workers":5,"started_on":1358513508,"type":"normalization","updated_on":1358513508}]
>
> As a result, it allows to deploy migration scripts (aka scenarios) and
> change big amount of documents as fast as possible (without HTTP overhead
> and some kind of 'delayed jobs') via internal CouchDB functions, such as
> couch_db:open_doc/2, couch_db:update_doc/3 and so on.
>
> Check more: https://github.com/datahogs/couch_normalizer
>
> If you want to contribute, feel free to open an Github issue or submit a
> pull request or ping me offline)
>
> It still under scoping and development.