You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Simon Metson <si...@cern.ch> on 2015/04/08 22:39:10 UTC

Re: [ANN] jqouch, a jq-based view server

Cute :)

On 30 March 2015 at 17:44, Brian McQueen <mc...@gmail.com> wrote:

> Nice! Jq is a great tool, and putting it in like that is quite nice.
>
> On Sun, Mar 29, 2015 at 6:14 AM, Matthieu Rakotojaona <
> matthieu.rakotojaona@gmail.com> wrote:
>
> > Hey Alexander,
> >
> > I don't think I'll re-implement jq in pure Golang. This could be an
> > interesting exercise in lexing/parsing, I'm not sure I'll make it until
> > the end.
> >
> > Using the C API though is the next step !
> >
> > Excerpts from Alexander Shorin's message of 2015-03-29 02:17:43 +0300:
> > > I knew that someone will make jq query server and here it is. Nice
> > > work, Matthieu!
> > >
> > > Do you plan to implement jq in Golang? That will significantly improve
> > > your query server and will allow others to embed jq into their apps.
> > > --
> > > ,,,^..^,,,
> > >
> > >
> > > On Sat, Mar 28, 2015 at 6:12 PM, Matthieu Rakotojaona
> > > <ma...@gmail.com> wrote:
> > > > Hello guys,
> > > >
> > > > I'd like to announce a jq-based view server for couchdb. It's
> extremely
> > > > rudimentary, but works as a proof of concept of what can be achieved:
> > > >
> > > > https://github.com/rakoo/jqouch
> > > >
> > > > A bit of background: jq is a cli tool to extract and render
> information
> > > > from any json you give it, with a custom but powerful syntax:
> > > >
> > > > $ curl localhost:5984 | jq '.vendor .version'
> > > > "1.6.1"
> > > >
> > > > $ curl localhost:5984/mydb | jq '.disk_size - .data_size'
> > > > 80892224
> > > >
> > > > Looks like I'd better compact !
> > > >
> > > > If you're dabbling with json and not using it already, I encourage
> you
> > > > to check it out.
> > > >
> > > > Basically jq is invoked with a filter (that's the '.vendor .version'
> > > > from the example above); you then feed jq with a JSON document in
> > stdin,
> > > > and it gives you all matches and transformations on stdout.  jqouch
> > > > works by taking the function given in "add_fun" and spawning an
> > external
> > > > process with this fun as a filter, and forwarding documents in
> > "map_doc"
> > > > to it. All output from jq is then sent back to CouchDB through jqouch
> > > > (jq processes are not killed after each doc, they stay alive as long
> as
> > > > the stdin is not closed, which jqouch never does until it dies)
> > > >
> > > > I have included some example in the repo, here they are. I'm using
> some
> > > > examples from a dump of... I don't know exactly what, but a sample is
> > > > here:
> > > >
> > > > https://github.com/rakoo/jqouch/blob/master/sample.json
> > > >
> > > > taken from http://parltrack.euwiki.org/dumps/eurlex.json.xz. That's
> > > > 22925 documents. I made some benchmarks on CouchDB 1.6:
> > > >
> > > > Here's a really simple view in js:
> > > >
> > > >     function(doc) {
> > > >       emit(doc.title, null)
> > > >     }
> > > >
> > > > it maps all docs in ~ 35s
> > > >
> > > > And the equivalent in jq:
> > > >
> > > >     [ [.title, null] ]
> > > >
> > > > it maps all docs in ~ 19s
> > > >
> > > > Each map function emits a list of kv pairs, there's no more emit();
> > it's
> > > > actually the format of what a query server has to return for each
> > > > mapping function. It may not be ideal, but it works.
> > > >
> > > > Here's an other, more "useful" set of view:
> > > >
> > > >   function(doc) {
> > > >     for (var i = 0; i < doc.dates.length; i++) {
> > > >       emit([doc.dates[i].type, doc.dates[i].date], null)
> > > >     }
> > > >   }
> > > >
> > > > runs in ~ 32s
> > > >
> > > >     [ .dates[] | [[.type, .date], null] ]
> > > >
> > > > runs in ~ 19s
> > > >
> > > >
> > > >
> > > >
> > > > There are a few things we can say:
> > > >
> > > > * For all 4 pairs of example views (see repo), jq is constantly
> almost
> > > >   twice as fast as the equivalent js. Moreover the couchjs process is
> > > >   always eating a large part of my CPU when running, whereas the jq
> > > >   process is never over 30%. This indicates some overhead is spent on
> > > >   passing documents betweer processes, which I'm going to investigate
> > > >   with the jq C API.
> > > >
> > > > * jq views can be hard to understand and write, but they can be
> tested
> > > >   through the cli jq tool directly, or even online with jqplay
> > > >   (https://jqplay.org/)
> > > >
> > > > * using jq doesn't (AFAIK) allow one to output non-deterministic
> > values,
> > > >   by default
> > > >
> > > > * jq is "sandboxed" in that it can't do anything other than transform
> > > >   documents, contrary to standard languages
> > > >
> > > > * jq filters are in my opininion very clear on what they do, such
> that
> > a
> > > >   one-line filter can be enough in most cases
> > > >
> > > > Of course, it's not all rainbows and unicorns:
> > > >
> > > > * there are still some quirks in the jq views, they can output
> > something
> > > >  like [null, null] when they should not return anything because the
> > > >  view doesn't apply to the doc.
> > > >
> > > > * jqouch currently doesn't understand anything other than "reset",
> > > >   "add_fun" and "map_doc"
> > > >
> > > > * I don't see the jq language as being enough for more generic
> > functions
> > > >   such as show and list, but who knows
> > > >
> > > > Anyway, there may be some value in using jq to define basic views,
> the
> > > > ones that just index a document on some value and don't do much more.
> > As
> > > > a non-serious CouchDB user I've never had to use really fancy views.
> > > >
> > > > Thoughts ?
> >
>
>
>
> --
> the news wire of the 21st century - twitchy.com
>