You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2013/05/11 18:13:01 UTC

[lucy-dev] C library documentation

Hello lucy-dev,

For the C library documentation, I decided to generate man pages from Lucy's "DocuComments". For every public class, a man page with the following contents is created:

    * Name and abstract
    * Description
    * Public functions
    * Public methods
        * Abstract methods
        * Novel methods
        * Methods inherited from superclasses
    * Inheritance info

To see how it looks, pull from master, change into the 'c' directory, build the library, and run:

    $ man -M autogen/man lucy_SegReader

There are some more classes, methods, and constructors which I think should be made public to be included in the documentation. See branch 'clownfish-public' for my suggestions.

The synopsis sections are still missing. The Perl bindings currently use here-docs for the synopses in the code in perl/buildlib/Lucy/Build/Binding.pm which I consider a bit ugly. I think we could benefit from a system that stores additional documentation sections for different host languages in extra files. But for the beginning, it might be enough to include some example C code derived from the test suite.

It would also be nice to have a high-level overview of how to use Clownfish classes from C. I don't mean how to create your own classes but simple stuff like:

    * Calling methods and functions (uppercase vs. lowercase)
    * Memory management
        * INCREF/DECREF
        * incremented return values
        * decremented arguments
    * Short name macros

Another thing that has to be done for the man pages is to handle markup in DocuComments. AFAICS, we currently use POD-style links and XML-style <code></code> markup. I remember there was some discussion to switch to Markdown. It might be a good time to revisit this.

When I'm finished with the man pages, I plan to use a man-to-html converter and put the result on the Lucy website.

Nick


Re: [lucy-dev] C library documentation

Posted by Peter Karman <pe...@peknet.com>.
Nick Wellnhofer wrote on 6/7/13 5:15 PM:
> But from a practical point of view, most host languages
> will use 'new' as constructor. So it would simplify things if we moved the
> constructor's documentation to the 'new' function and use an alias only for
> languages that have constructors with different names.
> 

+1



-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [lucy-dev] C library documentation

Posted by Nick Wellnhofer <we...@aevum.de>.
On Jun 9, 2013, at 08:41 , Marvin Humphrey <ma...@rectangular.com> wrote:

> If other folks are not satisfied with having docs attached to `init`, but I'm
> not satisfied with having them attached to `new`, can we please keep trying to
> find consensus for a little while longer?

OK, then what about writing individual documentation for all the 'new' and 'init' functions?

> I'm not enthusiastic about making exact duplicates of the documentation for
> every constructor, though. :(  That's the kind of ugliness they have to accept
> in Java because of signature overloading, but it would be nice if we could
> avoid it.

Generally, I can see three options:

    * Repeat all params in the 'new' and 'init' docs
    * Let the doc for 'new' refer to the 'init' params
    * Let the doc for 'init' refer to the 'new' params

I don't have a problem with duplicating the parameter descriptions. Redundancy in documentation can be a good thing, IMO. But I'm fine with any solution. We only need some docs for the C constructors, even if it's simply:

   "Constructor. See `init` for a description of the parameters."

Nick


Re: [lucy-dev] C library documentation

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Fri, Jun 7, 2013 at 3:15 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> It does make sense. But from a practical point of view, most host languages
> will use 'new' as constructor. So it would simplify things if we moved the
> constructor's documentation to the 'new' function and use an alias only for
> languages that have constructors with different names.

There are some practical problems with the approach of moving the constructor
documentation from `init` to `new`.

First, abstract classes, e.g. Lucy::Search::Query, don't *have* `new` -- they
only have `init`.  We could add a `new` function to such abstract classes, but
calling it would just result in an unavoidable runtime exception.  I question
whether it's for the best to increase library size and create "attractive
nuisance" functions which trick people into writing code that compiles but
inevitably crashes out, solely because we need something to attach our
documentation to.  (I know that's not your intent, either -- it's just a side
effect of the proposal.)

Second, Clownfish doesn't currently enforce the relationship between `new` and
`init`, so it's possible that the documentation may not be sync'd.  The Perl
bindings will provide a constructor called `new` -- because naming
constructors `new` is idiomatic for Perl -- but when you call Foo->new from
Perl, behind the scenes Foo_init() will be invoked, *not* Foo_new().  The same
will be true for Ruby and Python -- because Foo_new() does not support
subclassing, while Foo_init() does.  (Incidentally, Python constructors don't
use `new`, they use the class object as a function: `x = MyClass()` -- see
<http://docs.python.org/3/tutorial/classes.html#class-objects>.)

If other folks are not satisfied with having docs attached to `init`, but I'm
not satisfied with having them attached to `new`, can we please keep trying to
find consensus for a little while longer?

> I think it would also be more consistent for classes with multiple
> constructors.  In this case, we have to document the additional constructors
> because there aren't corresponding 'init' functions.

At the core level, Clownfish doesn't currently differentiate between inert
functions, whether they're named `new`, `init`, `decode_utf8_char`, `freeze`,
or whatever.  (The Perl level treats `init` specially, though.)

I think it makes sense to make `init` special -- like constructors in C++ or
Java, like `initialize` in Ruby and like `__init__` in Python.  Maybe we
should be trying to figure out some syntax, keyword, or capitalization scheme
to make *multiple* constructors special?  (They achieve that in Java etc. by
using the class name in conjunction with signature overloading.)

> But it's no problem to work with the current system. I can simply copy the
> 'init' documentation if 'new' doesn't have one.

I'm not enthusiastic about making exact duplicates of the documentation for
every constructor, though. :(  That's the kind of ugliness they have to accept
in Java because of signature overloading, but it would be nice if we could
avoid it.

Marvin Humphrey

Re: [lucy-dev] C library documentation

Posted by Nick Wellnhofer <we...@aevum.de>.
On Jun 5, 2013, at 02:29 , Marvin Humphrey <ma...@rectangular.com> wrote:

> There are a few odd cases which make the situation a little more complicated:
> 
> *   Abstract classes define `init` but not `new`.  (At the C level, at least.
>    The Perl bindings are different.)
> *   Some classes have no constructors: BoolNum, HashTombstone.
> *   Some classes need many custom constructors: CharBuf, Err.
> *   Several classes present constructors (named "open" by convention) which
>    attempt to return NULL and set an error variable on failure rather than
>    throw exceptions.
> 
> However, I don't think those oddities spoil the rationale.
> 
> Does that make sense?

It does make sense. But from a practical point of view, most host languages will use 'new' as constructor. So it would simplify things if we moved the constructor's documentation to the 'new' function and use an alias only for languages that have constructors with different names.

I think it would also be more consistent for classes with multiple constructors. In this case, we have to document the additional constructors because there aren't corresponding 'init' functions.

But it's no problem to work with the current system. I can simply copy the 'init' documentation if 'new' doesn't have one.

Nick


Re: [lucy-dev] C library documentation

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Jun 4, 2013 at 7:10 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> Another thing I noticed is that the DocuComments for the Perl constructors
> are taken from the 'init' functions, not the 'new' functions. What's the
> rationale behind this?

Our Perl constructors, which are named `new` by default, actually wrap `init`
rather than `new`.

The `init` functions allows us to supply objects blessed into arbitrary
classes at runtime.  In contrast, the vast majority of `new` functions defined
in core C code are convenience wrappers which allocate a blank object and then
immediately invoke `init` with the arguments which were passed in.  They are
shorter, but less flexible.

    // Equivalent:
    Hash *hash = Hash_init((Hash*)VTable_Make_Obj(HASH), 0);
    Hash *hash = Hash_new(0);

Other languages have special initialization constructors with essentially the
same behavior as our `init`: Ruby's `initialize`, Python's `__init__`, etc.

    http://ruby.about.com/od/oo/ss/Instantiation-And-The-Initialize-Method.htm
    http://docs.python.org/3/reference/datamodel.html?highlight=__init__#object.__init__

> For the C library, we have to document primarily the
> 'new' functions. I could add a special case to the code that generates the C
> documentation, but it would make more sense to me to move the DocuComments
> from 'init' to 'new'.

Now that you've brought this up and forced me to think it through... Perhaps
we should consider instead formalizing our commitment to `init` and keeping
the docs there.  In addition, maybe we should start autogenerating `new`
implicitly, allowing us to delete a few lines each across a broad number of
files.

There are a few odd cases which make the situation a little more complicated:

*   Abstract classes define `init` but not `new`.  (At the C level, at least.
    The Perl bindings are different.)
*   Some classes have no constructors: BoolNum, HashTombstone.
*   Some classes need many custom constructors: CharBuf, Err.
*   Several classes present constructors (named "open" by convention) which
    attempt to return NULL and set an error variable on failure rather than
    throw exceptions.

However, I don't think those oddities spoil the rationale.

Does that make sense?

Marvin Humphrey

Re: [lucy-dev] C library documentation

Posted by Nick Wellnhofer <we...@aevum.de>.
On May 11, 2013, at 18:13 , Nick Wellnhofer <we...@aevum.de> wrote:

> There are some more classes, methods, and constructors which I think should be made public to be included in the documentation. See branch 'clownfish-public' for my suggestions.

Another thing I noticed is that the DocuComments for the Perl constructors are taken from the 'init' functions, not the 'new' functions. What's the rationale behind this? For the C library, we have to document primarily the 'new' functions. I could add a special case to the code that generates the C documentation, but it would make more sense to me to move the DocuComments from 'init' to 'new'.

Nick


Re: [lucy-dev] C library documentation

Posted by Nick Wellnhofer <we...@aevum.de>.
On 14/05/2013 05:52, Marvin Humphrey wrote:
> Experience has shown that the uppercase/lowercase thing is a horrible trap,
> though -- that really needs to be fixed (by hiding the implementing function),
> not just explained.

For users from outside a parcel it is kind of fixed by symbol 
visibility. Using a lowercase name for a method will result in a link 
error. This has been discussed already:

     http://s.apache.org/pIr

I seem to remember you made a more concrete proposal somewhere, but 
maybe I'm wrong. Another possible approach would be to hide the 
implementing functions via #ifdefs.

> I wouldn't want to block the release because we haven't dealt with the
> Markdown transition yet.  It's not really straight-up Markdown either -- at
> the least it's a custom Markdown extension which supports @tags and funky
> links.
>
> We can hack in individual features as short term fixes rather than solving all
> these problems and integrating a parser as a dependencey.  For instance, we
> can replace the <code></code> construct with backticks.  Detecting
> markdown-style links and hacking in our own treatment wouldn't be much harder.

I can also work with the current constructs as they are. I think it's 
only POD-style links and <code>.

>> When I'm finished with the man pages, I plan to use a man-to-html converter
>> and put the result on the Lucy website.
>
> FWIW... In this day and age, the HTML is probably going to get read a lot more
> often.

True. It would be better to generate HTML directly. But like the man 
page generator, this will be really easy.

Nick


Re: [lucy-dev] C library documentation

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, May 11, 2013 at 9:13 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> There are some more classes, methods, and constructors which I think should
> be made public to be included in the documentation. See branch
> 'clownfish-public' for my suggestions.

I have misgivings about individual methods (e.g. `CB_Nip_One` is a lousy name,
can't we do better?), but since major changes are coming down the pipe anyway
(e.g. introducing immutable String), who's gonna nitpick?

On one hand, without making all that functionality public, the C library would
be basically unusable -- so there's really only one choice open to us.  On the
other hand, the design is not finished, so those APIs, despite being "public",
are not permanent.  This project has a history of being pretty conservative
about public APIs, to the point where we've probably frightened a lot of
potential users off with big scary warnings over the years.  Declaring an
unstable interface "public" feels strange -- but supposedly it's the kind of
thing which has worked for other Apache projects.

    http://s.apache.org/hZ

    Anyway, it's a design pattern: "good ideas and bad code build
    communities, the other three combinations do not".

> The synopsis sections are still missing. The Perl bindings currently use
> here-docs for the synopses in the code in
> perl/buildlib/Lucy/Build/Binding.pm which I consider a bit ugly. I think we
> could benefit from a system that stores additional documentation sections
> for different host languages in extra files. But for the beginning, it might
> be enough to include some example C code derived from the test suite.

Providing documentation in multiple languages is a hard problem.  Replacing
here-docs with something more elegant seems less pressing to me than dealing
with things like creating links which work across multiple documentation
systems.

> It would also be nice to have a high-level overview of how to use Clownfish
> classes from C. I don't mean how to create your own classes but simple stuff
> like:
>
>     * Calling methods and functions (uppercase vs. lowercase)
>     * Memory management
>         * INCREF/DECREF
>         * incremented return values
>         * decremented arguments
>     * Short name macros

Yes, without such an overview, people would be lost.

Experience has shown that the uppercase/lowercase thing is a horrible trap,
though -- that really needs to be fixed (by hiding the implementing function),
not just explained.

> Another thing that has to be done for the man pages is to handle markup in
> DocuComments. AFAICS, we currently use POD-style links and XML-style
> <code></code> markup. I remember there was some discussion to switch to
> Markdown. It might be a good time to revisit this.

I wouldn't want to block the release because we haven't dealt with the
Markdown transition yet.  It's not really straight-up Markdown either -- at
the least it's a custom Markdown extension which supports @tags and funky
links.

We can hack in individual features as short term fixes rather than solving all
these problems and integrating a parser as a dependencey.  For instance, we
can replace the <code></code> construct with backticks.  Detecting
markdown-style links and hacking in our own treatment wouldn't be much harder.

> When I'm finished with the man pages, I plan to use a man-to-html converter
> and put the result on the Lucy website.

FWIW... In this day and age, the HTML is probably going to get read a lot more
often.

Marvin Humphrey

Re: [lucy-dev] C library documentation

Posted by Peter Karman <pe...@peknet.com>.
Nick Wellnhofer wrote on 5/11/13 11:13 AM:

> For the C library documentation, I decided to generate man pages from Lucy's "DocuComments". For every public class, a man page with the following contents is created:
> 

This is looking good, Nick. I am a fan of man pages, however you find best to
generate them.

I see you are working on some sample code too, which is great. I look forward to
wiring up a swish_lucy.c example.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com