You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2012/07/14 19:45:04 UTC

[lucy-dev] Zap "Structural characters" in JsonParser.y

Greets,

Our JSON parser is powered by the Lemon parser generator[1], with the grammar
file in core/Lucy/Util/Json/JsonParser.y.  As formal grammars are the topic of
this week's book club, I thought it might be a good time to hack on it a bit.

Right now, the grammar is slightly more complex than it needs to be.  This
block is unnecessary:

    /* Structural characters. */
    begin_array     ::= LEFT_SQUARE_BRACKET.
    end_array       ::= RIGHT_SQUARE_BRACKET.
    begin_object    ::= LEFT_CURLY_BRACKET.
    end_object      ::= RIGHT_CURLY_BRACKET.
    name_separator  ::= COLON.
    value_separator ::= COMMA.

All we have to do is replace all instances in the file of the non-terminals on
the left with the corresponding terminals on the right and that block can be
deleted.

Have we got a volunteer willing to take this task on?

Marvin Humphrey

[1] http://www.hwaci.com/sw/lemon/lemon.html

Re: [lucy-dev] Zap "Structural characters" in JsonParser.y

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Jul 14, 2012 at 9:58 PM, Logan Bell <lo...@apache.org> wrote:
> Would this be something we could also throw
> into the bug branch or keep strictly in trunk?

Backporting is fine.  This is a low-risk change (though you'll probably need
a fresh `./Build clean; perl Build.PL; ./Build`).

Marvin Humphrey

Re: [lucy-dev] Zap "Structural characters" in JsonParser.y

Posted by Logan Bell <lo...@apache.org>.
Volunteer reporting for duty. Would this be something we could also throw
into the bug branch or keep strictly in trunk?

Thanks,
Logan

On Sat, Jul 14, 2012 at 10:45 AM, Marvin Humphrey <ma...@rectangular.com>wrote:

> Greets,
>
> Our JSON parser is powered by the Lemon parser generator[1], with the
> grammar
> file in core/Lucy/Util/Json/JsonParser.y.  As formal grammars are the
> topic of
> this week's book club, I thought it might be a good time to hack on it a
> bit.
>
> Right now, the grammar is slightly more complex than it needs to be.  This
> block is unnecessary:
>
>     /* Structural characters. */
>     begin_array     ::= LEFT_SQUARE_BRACKET.
>     end_array       ::= RIGHT_SQUARE_BRACKET.
>     begin_object    ::= LEFT_CURLY_BRACKET.
>     end_object      ::= RIGHT_CURLY_BRACKET.
>     name_separator  ::= COLON.
>     value_separator ::= COMMA.
>
> All we have to do is replace all instances in the file of the
> non-terminals on
> the left with the corresponding terminals on the right and that block can
> be
> deleted.
>
> Have we got a volunteer willing to take this task on?
>
> Marvin Humphrey
>
> [1] http://www.hwaci.com/sw/lemon/lemon.html
>

Re: [lucy-dev] Zap "Structural characters" in JsonParser.y

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Jul 17, 2012 at 8:59 AM, Kurt Starsinic <ks...@gmail.com> wrote:
> If we decide that we really like Jansson (or any other C-ish parser),
> I'm more than happy to write an XSUB wrapper for it.  I have a lot of
> experience with that.

XS expertise is very valuable -- there aren't that many people that know it
well, so it's cool to see your offer! :)

Lucy doesn't need to add Jansson as a dependency because we already have our
own tightly integrated JSON parser -- see LUCY-133: <http://s.apache.org/9WP>.

The background is that Lucy is written in something sort of like XS itself: an
OO toolkit called "Clownfish".  Here are some rough datatype mappings:

    Perl Clownfish     Ruby    Jansson
    ==== ============= ======= =======
    HV   cfish_Hash    RHash   json_t
    AV   cfish_VArray  RArray  json_t
    SV   cfish_CharBuf RString json_t

In the past, Lucy relied on the CPAN module JSON::XS for encoding and decoding
JSON.  That meant that when we needed to read in some JSON for use by the Lucy
core, the data would first be read into Perl data structures (HV, AV, SV) by
JSON::XS but then had to go through a deep conversion to Clownfish data
structures (Hash, VArray, CharBuf, etc).  When we switched to our own custom
JSON module which reads into Clownfish data structures directly, that extra
conversion stage was eliminated, speeding things up.

If we had gone with Jansson instead of rolling our own JSON module, we would
have had to write bridge code to convert Jansson's `json_t` data structures to
our Clownfish types, and we would still be paying the cost of deep conversion.
Plus Jansson's json_t API isn't small, FWIW:

  http://www.digip.org/jansson/doc/2.3/apiref.html#value-representation

Nevertheless, even if we don't need Jansson-to-Perl bindings for Lucy, there
are still some interesting opportunities around here for people who know XS.
Clownfish is designed to integrate closely with a "host" language, and CFC,
the Clownfish compiler, autogenerates a lot of XS code for Lucy's Perl
bindings.  Hacking on the CFC code that generates XS is a fun and challenging
project.

Alternately, if you have knowledge of the C apis for other dynamic languages,
or if you would like to acquire such knowledge, you could work on building CFC
bindings for them instead.  Logan and I have both put in some time on the CFC
Ruby bindings, motivated partly by a desire to enrich our understanding of
dynamic language design in general and Ruby in particular.

> P.S. I've edited the original message, because the last time I replied
> it was rejected as spam.

If you snoop the headers of the rejected mail, you will find a breakdown of
what contributed to the spam score.  FWIW, the most common problem is that
the ASF SpamAssassin config strongly dislikes mail in HTML format as opposed
to plain text.

Marvin Humphrey

Re: [lucy-dev] Zap "Structural characters" in JsonParser.y

Posted by Kurt Starsinic <ks...@gmail.com>.
If we decide that we really like Jansson (or any other C-ish parser),
I'm more than happy to write an XSUB wrapper for it.  I have a lot of
experience with that.

- Kurt

P.S. I've edited the original message, because the last time I replied
it was rejected as spam.

- Kurt


On Mon, Jul 16, 2012 at 5:25 PM, David E. Wheeler <[...]> wrote:
>
> On Jul 16, 2012, at 6:01 AM, Marvin Humphrey wrote:
>
> > So for *us*, a custom rig is a surefire winner over adding a dependency.  But
> > from the perspective of other potential users, our implementation wouldn't
> > offer any advantages over, say, Jansson.
> >
> >    [...]
> >
> > Maybe Jansson does what you need?
>
> It’s fine if I’m working in C or Objecive C, but not Perl.
>
>    [...]
>
> Anyway, I find your arguments persuasive.
>
> Best,
>
> David

Re: [lucy-dev] Zap "Structural characters" in JsonParser.y

Posted by "David E. Wheeler" <da...@justatheory.com>.
On Jul 16, 2012, at 6:01 AM, Marvin Humphrey wrote:

> So for *us*, a custom rig is a surefire winner over adding a dependency.  But
> from the perspective of other potential users, our implementation wouldn't
> offer any advantages over, say, Jansson.
> 
>    http://www.digip.org/jansson/
> 
> Maybe Jansson does what you need?

It’s fine if I’m working in C or Objecive C, but not Perl.

  https://metacpan.org/search?q=jansson

Anyway, I find your arguments persuasive.

Best,

David

Re: [lucy-dev] Zap "Structural characters" in JsonParser.y

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Jul 15, 2012 at 2:03 AM, David E. Wheeler <da...@justatheory.com> wrote:
> On Jul 14, 2012, at 7:45 PM, Marvin Humphrey wrote:
>
>> Our JSON parser is powered by the Lemon parser generator[1], with the grammar
>> file in core/Lucy/Util/Json/JsonParser.y.  As formal grammars are the topic of
>> this week's book club, I thought it might be a good time to hack on it a bit.
>
> How compliant is it?  Is this something we ought to consider breaking out
> into a separate library?  I could use a good C-based JSON parser at times…

There are a lot of good JSON parsers out there.  The spec is both simple and
coherent (unlike, say, YAML's), so writing a decent implementation is not that
hard in the grand scheme of things.

The thing is, C doesn't provide either hash table or bounded array data
structures natively -- so any JSON library has to either provide those itself
or add a dependency on a library that does.  By rolling our own JSON library,
we were able to optimize it for Clownfish data structures.  Had we not done
that, we would have had to translate between somebody else's data structures
and Clownfish's, which would have slowed things down and increased the size of
our compiled shared object.

So for *us*, a custom rig is a surefire winner over adding a dependency.  But
from the perspective of other potential users, our implementation wouldn't
offer any advantages over, say, Jansson.

    http://www.digip.org/jansson/

Maybe Jansson does what you need?

Marvin Humphrey

Re: [lucy-dev] Zap "Structural characters" in JsonParser.y

Posted by "David E. Wheeler" <da...@justatheory.com>.
On Jul 14, 2012, at 7:45 PM, Marvin Humphrey wrote:

> Our JSON parser is powered by the Lemon parser generator[1], with the grammar
> file in core/Lucy/Util/Json/JsonParser.y.  As formal grammars are the topic of
> this week's book club, I thought it might be a good time to hack on it a bit.

How compliant is it? Is this something we ought to consider breaking out into a separate library? I could use a good C-based JSON parser at times…

David