You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2014/03/13 14:13:53 UTC

[lucy-dev] Parcels and files

Lucifers,

You might have noticed that I created a new branch 'explicit-dependencies' in 
lucy-clownfish.git. One feature I'd like to implement is to ignore .cfh files 
from parcels that are not required by the current project. Currently, it is 
possible for a .cfh file to declare classes in multiple parcels or to contain 
only "C blocks" without any parcel statement at all. It would be easier to 
determine which parcel a file belongs to if only a single parcel statement was 
allowed at the top of a file.

(Ultimately, we might want to put all the .cfh files of a parcel in a separate 
directory to support the installation of multiple versions in parallel.)

Another thing I'd like to do is to get rid of the "default" parcel.

Thoughts?

Nick

Re: [lucy-dev] Parcels and files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Mar 16, 2014 at 3:29 PM, Nick Wellnhofer <we...@aevum.de> wrote:
>> Would it help to install a symlink for the most recent version?
>>
>>    $INCLUDE/Foo    # symlink to $INCLUDE/Foo-3.3.1
>
> Yes, but it shouldn't make a noticable difference.

If CFC ever supports partial compilation (individual Clownfish files rather
than whole parcels), might performance start to matter more?  Having a
deterministic location to look for a specific class can't hurt.

I agree that a layout which facilitates uninstallation is desirable.

> But there's a lot more happening in these examples. It seems that you want
> to merge parcel and class namespaces. What we have now:
>
>     Parcel: lucy
>     Class: Lucy::Index::Indexer
>
> How I understand your proposal:
>
>     Parcel: org.apache.lucy
>     Class: org.apache.lucy.index.Indexer
>
> The basic approach looks great. The only question is how to map the
> Clownfish names to the host language and to C symbols.

This is a hard but interesting problem.  Different host language communities
have different conventions, and it is impossible to create a lowest common
denominator default for Clownfish which works for all of them.

Python:

    import lucy
    indexer = lucy.index.Indexer(index='/path/to/index')

    from lucy.index import Indexer
    indexer = lucy.index.Indexer(index='/path/to/index')

Ruby:

    require "lucy"
    indexer = Lucy::Index::Indexer.new(:index => '/path/to/index')

Perl:

    use Lucy::Index::Indexer;
    my $indexer = Lucy::Index::Indexer->new(index => 'path/to/index');

Java:

    import org.apache.lucy.index.Indexer;
    ...
    Indexer indexer = new Indexer("/path/to/index");

    import org.apache.lucy.index.*
    ...
    Indexer indexer = new Indexer("/path/to/index");

Go:

    import "lucy.apache.org/lucy"
    ...
    indexer := lucy.NewIndexer("/path/to/index")

To achieve our goal of facilitating idiomatic bindings, we will need to
support customizable mapping for module/class/package names just as we support
customizable method names.  Since the Clownfish internals use class objects
much more often than class names, making class names customizable ought to be
doable.

Let's consider modifying the syntax for Clownfish class declarations.  We
already impose the constraint that the last "component" of each class name
must be unique within a parcel (since we used that for the struct name).  That
effectively imposes a flat namespace within each parcel.  How about making
the class name just an identifier rather than a qualified name?

    // `Indexer` instead of `Lucy::Index::Indexer` or other alternatives.
    parcel org.apache.lucy;
    public class Indexer { ... }

Now this option for import syntax makes more sense (especially with a
parcel-centric include dir layout):

    from org.apache.lucy use Indexer, IndexSearcher;

Unfortunately, one drawback of this approach is that it that it would make it
more difficult to know where a given class lives in the hierarchy.
("Indexer.cfh" might appear under "lucy/", under "lucy/index/", under
"lucy/index/foo/bar/baz/" or wherever.)  As a result, it's not obvious what
`.h` file to `#include` in C code.

If we were to put a mapping from class name to filepath for all (nested?)
public classes in the parcel file JSON, similar to the "provides" key in
META.json, that would help CFC to find the right `.cfh` file.  But it still
doesn't solve the problem for users who need to `#include` the right `.h`
header...

Marvin Humphrey

Re: [lucy-dev] Parcels and files

Posted by Nick Wellnhofer <we...@aevum.de>.

On Mar 16, 2014, at 20:58 , Marvin Humphrey <ma...@rectangular.com> wrote:

> On Sun, Mar 16, 2014 at 10:00 AM, Nick Wellnhofer <we...@aevum.de> wrote:
>> I had something like this in mind:
>> 
>>    $INCLUDE/Foo-2.2.0/Foo.cfp
>>    $INCLUDE/Foo-2.2.0/Foo/Gizmo.cfh
>>    $INCLUDE/Foo-3.3.1/Foo.cfp
>>    $INCLUDE/Foo-3.3.1/Foo/Gizmo.cfh
>>    $INCLUDE/Bar-0.1.0/Bar.cfp
>>    $INCLUDE/Bar-0.1.0/Bar/Widget.cfh
>> 
>> I think Ruby gems have such a directory structure. This makes it easy to
>> uninstall parcels.
> 
> What's the search algorithm?  To find the correct parcel, it looks like we'd
> have to read the complete $INCLUDE directory contents, parse parcel
> identifiers and sort by version -- which might get slow once you have a lot of
> packages installed.

It should be fast enough for thousands of parcels.

> Would it help to install a symlink for the most recent version?
> 
>    $INCLUDE/Foo    # symlink to $INCLUDE/Foo-3.3.1

Yes, but it shouldn’t make a noticable difference.

>> Good idea but I'm not really sold on hierarchical parcel names yet.
> 
> Convenient aliasing support in the import mechanism is crucial.  Here are some
> possible schemes:
> 
>    // Make `Indexer` available in the current .cfh file as an alias for
>    // `org.apache.lucy.index.Indexer`.
>    use org.apache.lucy.index.Indexer;
> 
>    // Make `Indexer` and `IndexSearcher` available.
>    from org.apache.lucy use Indexer, IndexSearcher;
> 
>    // Make `LucyIndexer` an alias for `org.apache.lucy.index.Indexer`.
>    from org.apache.lucy use Indexer as LucyIndexer;
> 
>    // Make `lucy` an alias for `org.apache.lucy`, so that
>    // `lucy.index.Indexer` and such become valid.
>    use org.apache.lucy;
> 
> Note that there is interplay between import mechanism syntax and how we lay
> out parcel contents in Clownfish include dirs.

But there’s a lot more happening in these examples. It seems that you want to merge parcel and class namespaces. What we have now:

    Parcel: lucy
    Class: Lucy::Index::Indexer

How I understand your proposal:

    Parcel: org.apache.lucy
    Class: org.apache.lucy.index.Indexer

The basic approach looks great. The only question is how to map the Clownfish names to the host language and to C symbols. I think you already made some proposals using fully qualified names in C symbols like ‘org_apache_lucy_index_Indexer_new’ (plus version and a host language tag at some point). This shouldn’t be a problem as long as the short name macros are used (and enabled by default). But how would the mapping to the host language namespace work?

>> I simply don't see how the default parcel could be useful in a real-world
>> project.
> 
> The motivation is not to facilitate real-world projects, it's to make it
> easier to write trivial programs -- especially when learning or writing
> documentation and tutorials.
> 
> But in any case, I'm OK with the plan to eliminate the default parcel and
> require a parcel statement up front:
> 
> YACC grammar for CFH file organization:
> 
>    cfh_file
>        : parcel_statement import_statements class_defs_or_cblocks
>        | parcel_statement                   class_defs_or_cblocks
>        ;

OK, this is something that can be done immediately.

Nick

Re: [lucy-dev] Parcels and files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Mar 16, 2014 at 10:00 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> I had something like this in mind:
>
>     $INCLUDE/Foo-2.2.0/Foo.cfp
>     $INCLUDE/Foo-2.2.0/Foo/Gizmo.cfh
>     $INCLUDE/Foo-3.3.1/Foo.cfp
>     $INCLUDE/Foo-3.3.1/Foo/Gizmo.cfh
>     $INCLUDE/Bar-0.1.0/Bar.cfp
>     $INCLUDE/Bar-0.1.0/Bar/Widget.cfh
>
> I think Ruby gems have such a directory structure. This makes it easy to
> uninstall parcels.

What's the search algorithm?  To find the correct parcel, it looks like we'd
have to read the complete $INCLUDE directory contents, parse parcel
identifiers and sort by version -- which might get slow once you have a lot of
packages installed.

Would it help to install a symlink for the most recent version?

    $INCLUDE/Foo    # symlink to $INCLUDE/Foo-3.3.1

> Good idea but I'm not really sold on hierarchical parcel names yet.

Convenient aliasing support in the import mechanism is crucial.  Here are some
possible schemes:

    // Make `Indexer` available in the current .cfh file as an alias for
    // `org.apache.lucy.index.Indexer`.
    use org.apache.lucy.index.Indexer;

    // Make `Indexer` and `IndexSearcher` available.
    from org.apache.lucy use Indexer, IndexSearcher;

    // Make `LucyIndexer` an alias for `org.apache.lucy.index.Indexer`.
    from org.apache.lucy use Indexer as LucyIndexer;

    // Make `lucy` an alias for `org.apache.lucy`, so that
    // `lucy.index.Indexer` and such become valid.
    use org.apache.lucy;

Note that there is interplay between import mechanism syntax and how we lay
out parcel contents in Clownfish include dirs.

The reasoning behind the hierarchical parcel names is simply namespace
differentation in a crowded world.  It ticks me off that we're not going to be
able to release a Ruby gem named "lucy" because somebody's already published a
gem with that name.  IMO, not using hierarchical package names in a modern
open source packaging ecosystem is like building a phone list database without
last names.

> I simply don't see how the default parcel could be useful in a real-world
> project.

The motivation is not to facilitate real-world projects, it's to make it
easier to write trivial programs -- especially when learning or writing
documentation and tutorials.

But in any case, I'm OK with the plan to eliminate the default parcel and
require a parcel statement up front:

YACC grammar for CFH file organization:

    cfh_file
        : parcel_statement import_statements class_defs_or_cblocks
        | parcel_statement                   class_defs_or_cblocks
        ;

Marvin Humphrey

Re: [lucy-dev] Parcels and files

Posted by Nick Wellnhofer <we...@aevum.de>.

On Mar 16, 2014, at 03:01 , Marvin Humphrey <ma...@rectangular.com> wrote:

> I think we also should consider changing Clownfish to require that .cfh files
> contain at most one public class and that their location within the directory
> tree correspond to that class.  That's a fairly significant change, but it
> yields a number of benefits:
> 
> *   It's easier to reason about parcels and Clownfish include dirs.
> *   It is always clear what .h file you need to #include within C code.
> *   It's possible to discover that a parcel does *not* contain a given public
>    class simply by inspecting the file system, rather than compiling all .cfh
>    files which belong to the parcel.
> *   Finding the .cfh file which contains a given public class is
>    straightforward.

+1

This would be useful for an import mechanism.

> With that in mind, here's one possibility for file layout within a Clownfish
> include directory:
> 
>    All files and directories below a .cfp file belong to the specified
>    parcel, unless preempted by a .cfp file in a subdirectory.
> 
>    In the following directory listing, there are two packages, `foo` and
>    `foo.bar`:
> 
>        $INCLUDE
>        $INCLUDE/foo
>        $INCLUDE/foo/bar
>        $INCLUDE/foo/bar/Gizmo.cfh
>        $INCLUDE/foo/bar.cfp           # parcel foo.bar
>        $INCLUDE/foo/Thing.cfh
>        $INCLUDE/foo/Widget.cfh
>        $INCLUDE/foo/defs.h
>        $INCLUDE/foo.cfp               # parcel foo
> 
>    The files `Thing.cfh` and `Widget.cfh` belong to the parcel `foo`, as does
>    the C header file `defs.h`.
> 
>    The file `Gizmo.cfh` belongs to the parcel `foo.bar`.
> 
> For the sake of argument, here's another possible layout which supports
> multiple versions per past discussions.
> 
>        $INCLUDE
>        $INCLUDE/foo
>        $INCLUDE/foo/bar/0
>        $INCLUDE/foo/bar/0/Gizmo.cfh
>        $INCLUDE/foo/bar/0/_parcel.cfp    # parcel foo.bar 0.x
>        $INCLUDE/foo/bar/1
>        $INCLUDE/foo/bar/1/Gizmo.cfh
>        $INCLUDE/foo/bar/1/_parcel.cfp    # parcel foo.bar 1.x
>        $INCLUDE/foo/1/Thing.cfh
>        $INCLUDE/foo/1/Widget.cfh
>        $INCLUDE/foo/1/defs.h
>        $INCLUDE/foo/1/_parcel.cfp        # parcel foo 1.x

I had something like this in mind:

    $INCLUDE/Foo-2.2.0/Foo.cfp
    $INCLUDE/Foo-2.2.0/Foo/Gizmo.cfh
    $INCLUDE/Foo-3.3.1/Foo.cfp
    $INCLUDE/Foo-3.3.1/Foo/Gizmo.cfh
    $INCLUDE/Bar-0.1.0/Bar.cfp
    $INCLUDE/Bar-0.1.0/Bar/Widget.cfh

I think Ruby gems have such a directory structure. This makes it easy to uninstall parcels.

> Honestly, I may be ready to give up on that feature.  As much as I think that
> library versioning is a pressing problem in the world of computer science,
> supporting multiple library versions simultaneously at runtime is very
> involved and not central to the mission of a "symbiotic object system".  Maybe
> we just make incremental progress on the versioning problem by formalizing the
> format and ordering of Clownfish version numbers and call it a day.

+1, this is low-priority.

> Another related issue is that we want to allow the parcel `com.example.stuff`
> to be located at the top of a source directory, in order to avoid
> inconveniently deep source trees (such as those which afflict many Java
> projects).  How about we support that by searching two places for the .cfp
> file corresponding to the parcel `com.example.stuff`?`
> 
> 1.  `$INCLUDE/com/example/stuff.cfp`
> 2.  `$INCLUDE/com.example.stuff.cfp`
> 
> Such behavior would facilitate development directory trees like this:
> 
>    core
>    core/com.example.stuff.cfp
>    core/com.example.stuff.tripe.cfp
>    core/stuff
>    core/stuff/Doodad.c
>    core/stuff/Doodad.cfh      # class com.example.stuff.Doodad
>    core/stuff/Gadget.c
>    core/stuff/Gadget.cfh      # class com.example.stuff.Gadget
>    core/tripe
>    core/tripe/Junk.c          # class com.example.stuff.tripe.Junk
>    core/tripe/Junk.cfh
> 
> The deep directory structure would be checked first.  A parcel file named
> `com.example.stuff.cfp` would be installed as `com/example/stuff.cfp`.

Good idea but I’m not really sold on hierarchical parcel names yet.

>> Another thing I'd like to do is to get rid of the "default" parcel.
> 
> So... every .cfh file will need to start with a parcel statement?
> 
> +0
> 
> It's a little more boilerplate for simple stuff but OK.  It's also something
> which could theoretically be relaxed later.
> 
> The CFC implementation code for the "default" parcel is pretty kludgy IIRC, so
> I'm guessing at least part of the motivation is cleanup?  Go for it.

I simply don’t see how the default parcel could be useful in a real-world project. We’d have to allow classes from different projects in the default parcel which is different from the way other parcels are handled. Also, using classes in the default parcel as prerequisite wouldn’t really work.

The only use right now is to simplify some tests but it also led me to introduce subtle bugs in the test suite.

Besides, there aren’t any end-to-end tests for the default parcel. I wouldn’t be surprised if we’d still have to fix a couple of corner cases to fully support it.

Nick

Re: [lucy-dev] Parcels and files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, Mar 13, 2014 at 6:13 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> One feature I'd like to implement is to ignore .cfh
> files from parcels that are not required by the current project.

+1

> It would be
> easier to determine which parcel a file belongs to if only a single parcel
> statement was allowed at the top of a file.

+1

> (Ultimately, we might want to put all the .cfh files of a parcel in a
> separate directory to support the installation of multiple versions in
> parallel.)

+1

I think we also should consider changing Clownfish to require that .cfh files
contain at most one public class and that their location within the directory
tree correspond to that class.  That's a fairly significant change, but it
yields a number of benefits:

*   It's easier to reason about parcels and Clownfish include dirs.
*   It is always clear what .h file you need to #include within C code.
*   It's possible to discover that a parcel does *not* contain a given public
    class simply by inspecting the file system, rather than compiling all .cfh
    files which belong to the parcel.
*   Finding the .cfh file which contains a given public class is
    straightforward.

With that in mind, here's one possibility for file layout within a Clownfish
include directory:

    All files and directories below a .cfp file belong to the specified
    parcel, unless preempted by a .cfp file in a subdirectory.

    In the following directory listing, there are two packages, `foo` and
    `foo.bar`:

        $INCLUDE
        $INCLUDE/foo
        $INCLUDE/foo/bar
        $INCLUDE/foo/bar/Gizmo.cfh
        $INCLUDE/foo/bar.cfp           # parcel foo.bar
        $INCLUDE/foo/Thing.cfh
        $INCLUDE/foo/Widget.cfh
        $INCLUDE/foo/defs.h
        $INCLUDE/foo.cfp               # parcel foo

    The files `Thing.cfh` and `Widget.cfh` belong to the parcel `foo`, as does
    the C header file `defs.h`.

    The file `Gizmo.cfh` belongs to the parcel `foo.bar`.

For the sake of argument, here's another possible layout which supports
multiple versions per past discussions.

        $INCLUDE
        $INCLUDE/foo
        $INCLUDE/foo/bar/0
        $INCLUDE/foo/bar/0/Gizmo.cfh
        $INCLUDE/foo/bar/0/_parcel.cfp    # parcel foo.bar 0.x
        $INCLUDE/foo/bar/1
        $INCLUDE/foo/bar/1/Gizmo.cfh
        $INCLUDE/foo/bar/1/_parcel.cfp    # parcel foo.bar 1.x
        $INCLUDE/foo/1/Thing.cfh
        $INCLUDE/foo/1/Widget.cfh
        $INCLUDE/foo/1/defs.h
        $INCLUDE/foo/1/_parcel.cfp        # parcel foo 1.x

Honestly, I may be ready to give up on that feature.  As much as I think that
library versioning is a pressing problem in the world of computer science,
supporting multiple library versions simultaneously at runtime is very
involved and not central to the mission of a "symbiotic object system".  Maybe
we just make incremental progress on the versioning problem by formalizing the
format and ordering of Clownfish version numbers and call it a day.

Another related issue is that we want to allow the parcel `com.example.stuff`
to be located at the top of a source directory, in order to avoid
inconveniently deep source trees (such as those which afflict many Java
projects).  How about we support that by searching two places for the .cfp
file corresponding to the parcel `com.example.stuff`?`

1.  `$INCLUDE/com/example/stuff.cfp`
2.  `$INCLUDE/com.example.stuff.cfp`

Such behavior would facilitate development directory trees like this:

    core
    core/com.example.stuff.cfp
    core/com.example.stuff.tripe.cfp
    core/stuff
    core/stuff/Doodad.c
    core/stuff/Doodad.cfh      # class com.example.stuff.Doodad
    core/stuff/Gadget.c
    core/stuff/Gadget.cfh      # class com.example.stuff.Gadget
    core/tripe
    core/tripe/Junk.c          # class com.example.stuff.tripe.Junk
    core/tripe/Junk.cfh

The deep directory structure would be checked first.  A parcel file named
`com.example.stuff.cfp` would be installed as `com/example/stuff.cfp`.

I can elaborate on the rationale for that proposal in another email.

> Another thing I'd like to do is to get rid of the "default" parcel.

So... every .cfh file will need to start with a parcel statement?

+0

It's a little more boilerplate for simple stuff but OK.  It's also something
which could theoretically be relaxed later.

The CFC implementation code for the "default" parcel is pretty kludgy IIRC, so
I'm guessing at least part of the motivation is cleanup?  Go for it.

Marvin Humphrey