You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2011/02/21 21:16:00 UTC

[lucy-dev] Porting Clownfish compiler to C

Greets,

The porting of the Clownfish compiler to C has been underway for the last
several weeks.  As previously discussed, this move was necessary in order to
replace the Perl-licensed CPAN module Parse::RecDescent with the public-domain
Lemon parser generator, written in C by SQLite author Richard Hipp.  

Now that LEGAL-86 has finally been resolved favorably, usage of
Parse::RecDescent no longer blocks our 0.1.0-incubating release, taking some
of the urgency out of the task.  Nevertheless, trunk/clownfish is in a
transitional state right now and I think it makes sense to push through to a
coherent stopping point.

The end goal is to have Clownfish entirely in C, as once that is done, it
eliminates the dependency for other Lucy host language bindings on Perl.
These are the stages of the transition:

  1. *DONE* Migrate to an inside-out object model within the Clownfish
     compiler internals.  This makes it easier to move piecemeal from Perl
     implementations to XS to C implementations.
  2. *DONE* Eliminate sophisticated usage of polymorphism by Clownfish
     compiler components, e.g. by rolling up many Type classes into one
     module.  In our C-based compiler, we can still use crude inheritance
     based on struct layout and casting, but we don't want to require method
     overriding if we can help it.
  3. *UNDERWAY* Port primary Clownfish components to thin XS wrappers around C
     implementations.  This includes everything within trunk/clownfish/lib/
     except the items under lib/Clownfish/Binding/ and lib/Clownfish/Parser.pm.
  4. Port everything under trunk/clownfish/lib/Clownfish/Binding/ to XS
     wrappers around C code.
  5. Port Clownfish/Parser.pm to an XS wrapper around a C implementation using
     Lemon.
  6. Port all the test files in trunk/clownfish/t/ to C, using the test
     harness code provided by Charmonizer.
  7. Change the interface by which bindings are spec'd to e.g. parse static
     JSON files rather than be invoked from Perl code, and change over all the
     binding specs embedded within .pm files in trunk/perl/lib/ to use the new
     interface.
  8. Remove all Perl/XS from trunk/clownfish/.

In order to eliminate Parse::RecDescent as a dependency, we need to get
through stage 5, and that had been my previous goal for the 0.1.0-incubating
release.  Now it seems to make sense to pause after either stage 3 or stage 4.

I look forward to completing all 8 stages, as once all the Perl code is
eliminated, it will make it easier for a larger Lucy community whose primary
expertise is in C to grok, to maintain and to write new host language
bindings.

Marvin Humphrey


Re: [lucy-dev] Porting Clownfish compiler to C

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Marvin,

Okey dokey. Thanks.

Cheers,
Chris

On Feb 21, 2011, at 1:52 PM, Marvin Humphrey wrote:

> On Mon, Feb 21, 2011 at 12:29:00PM -0800, Mattmann, Chris A (388J) wrote:
>> Is there any reason to look at SWIG for the language bindings (not now, but
>> maybe later?). I know SWIG is pretty good at taking C code and generating
>> language specific bindings and that SVN uses it.
> 
> SWIG was considered and rejected long ago.  It does not provide sufficient
> flexibility, power, or elegance to meet our high standards for interface
> design.
> 
> We touched on the subject of SWIG during the discussion of LUCY-5, which
> introduced "Boilerplater", later renamed to "Clownfish":
> 
>  http://s.apache.org/pSd
> 
>  However, the bindings we can generate with Boilerplater are much more
>  powerful and integrated into our custom OO model than what we could achieve
>  with SWIG.  SWIG bindings allow you to invoke the C library from the host
>  via wrappers.  Bindings generated by Boilerplater, on the other hand, allow
>  you to write subclasses entirely in the host language which override methods
>  defined in the C core.
> 
> This feature has been exploited to write custom subclasses of Query,
> QueryParser, Highlighter, FieldType, Schema, Similarity, IndexManager and so
> on -- all in pure Perl.  Several such projects have ended up as distributions
> published on CPAN. 
> 
> Clownfish also offers these features:
> 
>  * Automatic refcount management (thanks to the "incremented" and
>    "decremented" keywords).
>  * Default parameter values.
>  * Method bindings which use labeled parameters rather than positional
>    arguments.
>  * Sophisticated parameter validation.
>  * Caching of host objects, for speed and to make inside-out subclass
>    implementations practical.
> 
> SWIG bindings would not allow us to meet Lucy's central design goal of
> providing highly idiomatic interfaces tuned for each host language.
> 
>  http://wiki.apache.org/incubator/LucyProposal
> 
>  Proposal
> 
>  Lucy has two aims. First, it will be a high-performance C search engine
>  library. Second, it will maximize its usability and power when accessed via
>  dynamic language bindings. To that end, it will present highly idiomatic,
>  carefully tailored APIs for each of its "host" binding languages, including
>  support for subclasses written entirely in the "host" language. 
> 
> Clownfish is thus an essential component of Lucy.  However, it is so
> seamlessly integrated that end users have no idea that it exists. :)
> 
> Marvin Humphrey
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [lucy-dev] Porting Clownfish compiler to C

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Feb 21, 2011 at 12:29:00PM -0800, Mattmann, Chris A (388J) wrote:
> Is there any reason to look at SWIG for the language bindings (not now, but
> maybe later?). I know SWIG is pretty good at taking C code and generating
> language specific bindings and that SVN uses it.

SWIG was considered and rejected long ago.  It does not provide sufficient
flexibility, power, or elegance to meet our high standards for interface
design.

We touched on the subject of SWIG during the discussion of LUCY-5, which
introduced "Boilerplater", later renamed to "Clownfish":

  http://s.apache.org/pSd

  However, the bindings we can generate with Boilerplater are much more
  powerful and integrated into our custom OO model than what we could achieve
  with SWIG.  SWIG bindings allow you to invoke the C library from the host
  via wrappers.  Bindings generated by Boilerplater, on the other hand, allow
  you to write subclasses entirely in the host language which override methods
  defined in the C core.

This feature has been exploited to write custom subclasses of Query,
QueryParser, Highlighter, FieldType, Schema, Similarity, IndexManager and so
on -- all in pure Perl.  Several such projects have ended up as distributions
published on CPAN. 

Clownfish also offers these features:

  * Automatic refcount management (thanks to the "incremented" and
    "decremented" keywords).
  * Default parameter values.
  * Method bindings which use labeled parameters rather than positional
    arguments.
  * Sophisticated parameter validation.
  * Caching of host objects, for speed and to make inside-out subclass
    implementations practical.

SWIG bindings would not allow us to meet Lucy's central design goal of
providing highly idiomatic interfaces tuned for each host language.

  http://wiki.apache.org/incubator/LucyProposal

  Proposal

  Lucy has two aims. First, it will be a high-performance C search engine
  library. Second, it will maximize its usability and power when accessed via
  dynamic language bindings. To that end, it will present highly idiomatic,
  carefully tailored APIs for each of its "host" binding languages, including
  support for subclasses written entirely in the "host" language. 

Clownfish is thus an essential component of Lucy.  However, it is so
seamlessly integrated that end users have no idea that it exists. :)

Marvin Humphrey


Re: [lucy-dev] Porting Clownfish compiler to C

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Marvin,

Is there any reason to look at SWIG for the language bindings (not now, but maybe later?). I know SWIG is pretty good at taking C code and generating language specific bindings and that SVN uses it.

Cheers,
Chris

On Feb 21, 2011, at 12:16 PM, Marvin Humphrey wrote:

> Greets,
> 
> The porting of the Clownfish compiler to C has been underway for the last
> several weeks.  As previously discussed, this move was necessary in order to
> replace the Perl-licensed CPAN module Parse::RecDescent with the public-domain
> Lemon parser generator, written in C by SQLite author Richard Hipp.  
> 
> Now that LEGAL-86 has finally been resolved favorably, usage of
> Parse::RecDescent no longer blocks our 0.1.0-incubating release, taking some
> of the urgency out of the task.  Nevertheless, trunk/clownfish is in a
> transitional state right now and I think it makes sense to push through to a
> coherent stopping point.
> 
> The end goal is to have Clownfish entirely in C, as once that is done, it
> eliminates the dependency for other Lucy host language bindings on Perl.
> These are the stages of the transition:
> 
>  1. *DONE* Migrate to an inside-out object model within the Clownfish
>     compiler internals.  This makes it easier to move piecemeal from Perl
>     implementations to XS to C implementations.
>  2. *DONE* Eliminate sophisticated usage of polymorphism by Clownfish
>     compiler components, e.g. by rolling up many Type classes into one
>     module.  In our C-based compiler, we can still use crude inheritance
>     based on struct layout and casting, but we don't want to require method
>     overriding if we can help it.
>  3. *UNDERWAY* Port primary Clownfish components to thin XS wrappers around C
>     implementations.  This includes everything within trunk/clownfish/lib/
>     except the items under lib/Clownfish/Binding/ and lib/Clownfish/Parser.pm.
>  4. Port everything under trunk/clownfish/lib/Clownfish/Binding/ to XS
>     wrappers around C code.
>  5. Port Clownfish/Parser.pm to an XS wrapper around a C implementation using
>     Lemon.
>  6. Port all the test files in trunk/clownfish/t/ to C, using the test
>     harness code provided by Charmonizer.
>  7. Change the interface by which bindings are spec'd to e.g. parse static
>     JSON files rather than be invoked from Perl code, and change over all the
>     binding specs embedded within .pm files in trunk/perl/lib/ to use the new
>     interface.
>  8. Remove all Perl/XS from trunk/clownfish/.
> 
> In order to eliminate Parse::RecDescent as a dependency, we need to get
> through stage 5, and that had been my previous goal for the 0.1.0-incubating
> release.  Now it seems to make sense to pause after either stage 3 or stage 4.
> 
> I look forward to completing all 8 stages, as once all the Perl code is
> eliminated, it will make it easier for a larger Lucy community whose primary
> expertise is in C to grok, to maintain and to write new host language
> bindings.
> 
> Marvin Humphrey
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [lucy-dev] Porting Clownfish compiler to C

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Mar 06, 2011 at 10:31:51PM -0600, Peter Karman wrote:
> any time estimate for Stage 4? I understand the importance of context switching,
> and if it were a few days, given your current momentum, I'd vote for pushing on.

It might take a few days, but that would be the low-end estimate.  I'm not
working on this exclusively, for starters.

> otherwise, if this is a suitable pausing place, let's give cutting a 0.1 release
> our attention.

Each day the release lags our opportunity costs mount, like compounding
interest.  Let's move on.

Marvin Humphrey


Re: [lucy-dev] Porting Clownfish compiler to C

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 3/6/11 10:16 PM:
> On Mon, Feb 21, 2011 at 12:16:00PM -0800, Marvin Humphrey wrote:
> 
>>   3. *UNDERWAY* Port primary Clownfish components to thin XS wrappers around C
>>      implementations.  This includes everything within trunk/clownfish/lib/
>>      except the items under lib/Clownfish/Binding/ and lib/Clownfish/Parser.pm.
> 
>> Now it seems to make sense to pause after either stage 3 or stage 4.
> 
> Stage 3 is finished as of tonight.
> 
> While I've built up some momentum and would like to continue, this porting
> task doesn't block the release and I expect to set it aside for now.
> 

any time estimate for Stage 4? I understand the importance of context switching,
and if it were a few days, given your current momentum, I'd vote for pushing on.

otherwise, if this is a suitable pausing place, let's give cutting a 0.1 release
our attention.

in any case -- cheers for this milestone!


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [lucy-dev] Porting Clownfish compiler to C

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Feb 21, 2011 at 12:16:00PM -0800, Marvin Humphrey wrote:

>   3. *UNDERWAY* Port primary Clownfish components to thin XS wrappers around C
>      implementations.  This includes everything within trunk/clownfish/lib/
>      except the items under lib/Clownfish/Binding/ and lib/Clownfish/Parser.pm.

> Now it seems to make sense to pause after either stage 3 or stage 4.

Stage 3 is finished as of tonight.

While I've built up some momentum and would like to continue, this porting
task doesn't block the release and I expect to set it aside for now.

Marvin Humphrey


Re: [lucy-dev] Porting Clownfish compiler to C

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 2/21/11 2:16 PM:

> 
> The end goal is to have Clownfish entirely in C, as once that is done, it
> eliminates the dependency for other Lucy host language bindings on Perl.
> These are the stages of the transition:

Thanks for the details, Marvin. Very helpful to see the roadmap.

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com