You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2014/11/24 17:49:27 UTC

[lucy-dev] Standalone documentation files

Lucifers,

It would be nice to support standalone documentation files that work across 
all host languages. A hackish way that already works is to define empty inert 
classes with only a DocuComment. This approach is already used for 
Lucy::Docs::DevGuide and Lucy::Docs::FileLocking. The downsides are that this 
creates superfluous header files and that for lenghty documentation, it would 
be nice to work with plain Markdown that doesn't have to wrapped in C comments.

The problem with a plain .md file is you can't tell which parcel the 
documentation file belongs to. This piece of information is needed to organize 
the C documentation and for the custom URI scheme. But it should be easy to 
add this metadata with a `@parcel Lucy` directive, similar to the `@param` 
syntax for function documentation.

Another approach is to use some kind of "distribution" identifier that is 
unique across a single build and can be used for multiple parcels. We already 
have something similar (`boot_class`) for the Perl bindings, but I'd prefer to 
keep parcels completely separate regardless of how they're distributed and 
built. (I'd also like to have a separate .xs files for each parcels, for example.)

Thoughts?

Nick

Re: [lucy-dev] Standalone documentation files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Nov 25, 2014 at 9:56 PM, Marvin Humphrey <ma...@rectangular.com> wrote:

> I suggest that the parcel dirs in "include" directories be required to follow
> the naming convention "PARCEL-VERSION".  That will make it possible to know
> what parcels are available without having to parse every .cfp file.
>
> For "source" directories in contrast, any top-level dir could contain any one
> arbitrary parcel.

This suggestion is flawed as it would make it unwieldy to supply working
project directories as "include" dirs.  I have some ideas... let me think
through them once I'm truly awake and resubmit.

Marvin Humphrey

Re: [lucy-dev] Standalone documentation files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Nov 28, 2014 at 5:22 AM, Nick Wellnhofer <we...@aevum.de> wrote:

> If we'd keep the `parcel` directives, we should check that they match the
> parcel from the .cfp file. This seems like an unnecessary complication.

For what it's worth, that's what Go does.

> I think it's a bad idea to install a parcel that doesn't declare on which
> Clownfish version it depends. For simple testing, we could use a default
> parcel.
>
> But the .cfp file could simply be generated using the project bootstrapping
> tool we already discussed. So I don't see why we shouldn't make it
> mandatory.

So, the effect will be to make it impractical to start a Clownfish project
without the bootstrapping tool.  I suppose for something like Clownfish in its
present form, that's not unreasonable.

Let's go with your plan.

I can't help but wish that single-file Clownfish projects were possible, and
it bothers me to move away from that.   But if the Clownfish virtual machine
succeeds in the marketplace, eventually there could be dedicated programming
languages which run on top of it -- filling that niche more elegantly than we
could hope to given the constraints we're working under to build a runtime
with a C API.

>>> 2. .cfp files don't need a unique filename. They could be simply named
>>> `parcel.json`, for example.
>>
>> We've decided to avoid the nested directory problem of Java, but a side
>> effect is that you can't necessarily discover the parcel by looking at the
>> file system -- at least in a "source" directory.
>>
>> So, while I agree that unique names are not technically necessary, I'm not
>> jumping up and down with enthusiasm about normalizing them all to
>> `parcel.json`.
>
> Having a fixed filename would simplify things a bit. But I can live with
> .cfp files.

Now that you mention it, using a fixed name is simpler not only for tooling
implementation, but also for humans reasoning about the system: the name
"parcel.json" communicates purpose and data format in a way that "cfp file"
does not.

+1 for standardizing on "parcel.json".

>>> Another consequence is that we either need another directory level under
>>> `core` for Lucy and Clownfish because of the test parcels:
>>>
>>>      core/lucy/Lucy/Analysis
>>>      core/test/Lucy/Test/TestAnalysis
>>>
>>> Or we put the tests in a separate top-level directory:
>>>
>>>      core/Lucy/Analysis
>>>      test/Lucy/Test/TestAnalysis
>>
>> That's one approach.  If test code remains in a separate parcel, though,
>> the problem of test code not being able to access non-exported symbols from
>> the main parcel has yet to be resolved.
>
> Does it? As long as the test parcel is linked into the same binary, there
> shouldn't be a problem.

I suppose that's true.

+1 for the convention of a "test" folder at the top level to hold core tests.

We'll need some unusual load-time logic for running core tests under dynamic
hosts.

I'd also like to standardize on the dir we use for host-specific C code
instead of having `perl/xs/`, `c/src/`, etc.  How about defaulting to `cfext`?
That shouldn't collide with anybody.

>> Let me also float a possibility I've been thinking about for a while, which
>> is that the output of CFC should be static archives plus headers, rather
>> than C source code plus headers.  Perhaps that opens up some options for
>> dealing with test files.
>
> +1

Glad we're on the same page about that!

Outputting .a files from CFC should simplify build scripts for
Clownfish-powered extensions -- it's easier to supply linker flags than to
persuade the host build environment to do the right thing with C files
scattered accross multiple directories requiring different compiler flags.

Marvin Humphrey

Re: [lucy-dev] Standalone documentation files

Posted by Nick Wellnhofer <we...@aevum.de>.

On 27/11/2014 22:33, Marvin Humphrey wrote:
> On Wed, Nov 26, 2014 at 3:25 PM, Nick Wellnhofer <we...@aevum.de> wrote:
>> So the Clownfish compiler could simply walk
>> the source directories, and as soon as it finds a .cfp file, it can make all
>> files in that directory part of this parcel.
>
> I'm not sure you were implying this, but a parcel file should not apply to
> directories above it.  Parcel files should only be allowed in the
> primary parcel dir.

Sorry for being unclear. I meant that only files below the directory where the 
.cfp file is found should be part of the parcel.

>> 1. There's no need for `parcel` directives in .cfh files anymore. This would
>> remove a bit of boilerplate.
>
> Both Python and Go base organize packages around directories, but while Go
> requires package directives in every source file, Python doesn't.  So there
> are precedents or both approaches.

If we'd keep the `parcel` directives, we should check that they match the 
parcel from the .cfp file. This seems like an unnecessary complication.

> I maintain that it's important that parcel files not be mandatory.  If there's
> no parcel file and no parcel directive in any source files, what should the
> behavior be?  Default to parcel "main" version 0?  If so, what does that imply
> when someone attempts to install something from parcel "main"?

I think it's a bad idea to install a parcel that doesn't declare on which 
Clownfish version it depends. For simple testing, we could use a default parcel.

But the .cfp file could simply be generated using the project bootstrapping 
tool we already discussed. So I don't see why we shouldn't make it mandatory.

>> 2. .cfp files don't need a unique filename. They could be simply named
>> `parcel.json`, for example.
>
> We've decided to avoid the nested directory problem of Java, but a side effect
> is that you can't necessarily discover the parcel by looking at the file
> system -- at least in a "source" directory.
>
> So, while I agree that unique names are not technically necessary, I'm not
> jumping up and down with enthusiasm about normalizing them all to
> `parcel.json`.

Having a fixed filename would simplify things a bit. But I can live with .cfp 
files.

>> Another consequence is that we either need another directory level under
>> `core` for Lucy and Clownfish because of the test parcels:
>>
>>      core/lucy/Lucy/Analysis
>>      core/test/Lucy/Test/TestAnalysis
>>
>> Or we put the tests in a separate top-level directory:
>>
>>      core/Lucy/Analysis
>>      test/Lucy/Test/TestAnalysis
>
> That's one approach.  If test code remains in a separate parcel, though, the
> problem of test code not being able to access non-exported symbols from the
> main parcel has yet to be resolved.

Does it? As long as the test parcel is linked into the same binary, there 
shouldn't be a problem.

> Let me float another possibility: test code belongs to the same parcel and
> lives in the same directory as the parcel source code, but test files are
> distinguished by a naming convention such as beginning with the string "Test".
> Two versions of the parcel library are linked: a test version with all the
> test file objects included, and an installation version with the test objects
> excluded.

I agree that compiling two versions of a library seems like the best approach 
to solve the long-standing issue of creating binaries without all the test 
code. But this should work with a separate test parcel as well. I don't want 
to remove the feature to link multiple parcels into a single library.

> Let me also float a possibility I've been thinking about for a while, which is
> that the output of CFC should be static archives plus headers, rather than C
> source code plus headers.  Perhaps that opens up some options for dealing with
> test files.

+1

Nick

Re: [lucy-dev] Standalone documentation files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Wed, Nov 26, 2014 at 3:25 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> I think every parcel needs to have a .cfp file to tell which version of the
> Clownfish runtime it depends on.

I don't think we should make that a hard requirement, because it increases the
developer cost up front to start a project.  A newbie following a Clownfish
tutorial shouldn't need to create a parcel file to write "hello world", and an
expert shouldn't need to create one when running an experiment or coding up a
throwaway.

It should be acceptable either to provide a parcel via a parcel directive in a
source file, or to omit the parcel altogether.

> So the Clownfish compiler could simply walk
> the source directories, and as soon as it finds a .cfp file, it can make all
> files in that directory part of this parcel.

I'm not sure you were implying this, but a parcel file should not apply to
directories above it.  Parcel files should only be allowed in the
primary parcel dir.

> If we enforce that all files of a parcel live under a single directory, we
> can also make the following simplifications:
>
> 1. There's no need for `parcel` directives in .cfh files anymore. This would
> remove a bit of boilerplate.

Both Python and Go base organize packages around directories, but while Go
requires package directives in every source file, Python doesn't.  So there
are precedents or both approaches.

I maintain that it's important that parcel files not be mandatory.  If there's
no parcel file and no parcel directive in any source files, what should the
behavior be?  Default to parcel "main" version 0?  If so, what does that imply
when someone attempts to install something from parcel "main"?

> 2. .cfp files don't need a unique filename. They could be simply named
> `parcel.json`, for example.

We've decided to avoid the nested directory problem of Java, but a side effect
is that you can't necessarily discover the parcel by looking at the file
system -- at least in a "source" directory.

So, while I agree that unique names are not technically necessary, I'm not
jumping up and down with enthusiasm about normalizing them all to
`parcel.json`.

> Another consequence is that we either need another directory level under
> `core` for Lucy and Clownfish because of the test parcels:
>
>     core/lucy/Lucy/Analysis
>     core/test/Lucy/Test/TestAnalysis
>
> Or we put the tests in a separate top-level directory:
>
>     core/Lucy/Analysis
>     test/Lucy/Test/TestAnalysis

That's one approach.  If test code remains in a separate parcel, though, the
problem of test code not being able to access non-exported symbols from the
main parcel has yet to be resolved.

Let me float another possibility: test code belongs to the same parcel and
lives in the same directory as the parcel source code, but test files are
distinguished by a naming convention such as beginning with the string "Test".
Two versions of the parcel library are linked: a test version with all the
test file objects included, and an installation version with the test objects
excluded.

I don't think either of these approaches is quite satisfactory yet, so let's
keep exploring.

Let me also float a possibility I've been thinking about for a while, which is
that the output of CFC should be static archives plus headers, rather than C
source code plus headers.  Perhaps that opens up some options for dealing with
test files.

> Regarding the include directory layout, maybe we should make a change to my
> original proposal inspired by Ruby gems. Instead of putting a parcel's file
> in a directory named `parcel-version`, we could create single directory for
> all versions of a parcel with subdirectories for each version like:
>
>     include/com.example.foo/1.0.0/parcel.json
>     include/com.example.foo/2.0.0/parcel.json
>
> So we wouldn't have to read the whole top-level directory to lookup parcels.
> Whether a parcel exists can be decided with a single `stat` call. To find
> all the versions of a parcel, only the entries in the parcel's subdirectory
> have to be consulted.

+1, sounds great!

Marvin Humphrey

Re: [lucy-dev] Standalone documentation files

Posted by Nick Wellnhofer <we...@aevum.de>.

On 26/11/2014 23:35, Marvin Humphrey wrote:
> As noted earlier: Requiring that the contents within "include" directories
> adhere to such strict naming conventions is fine for installers but unfriendly
> to development.
>
> So instead, how about applying this ruleset to every top level directory in
> either "source" or "include"?
>
> 1.  If we can detect that a given directory follows the naming convention
>      PARCEL-VERSION, extract the parcel name from the dir name.
> 2.  Otherwise look for a .cfp file at the top level.
> 3.  Otherwise, parse every Clownfish file in the tree.  There should be only
>      one parcel specified.
>
> I think this would allow us to form a unified array of search paths consisting
> of all "source" dirs followed by all "include" dirs.  (Is there any reason we
> need to continue distinguising between the "source" and "include" for
> search-path purposes?)
>
> An important use case is a minimal single-parcel project, which should be
> possible using only a single directory and two loose files:
>
>    core/MyProject.cfh   # declares `parcel com.example.foo`
>    core/MyProject.c
>
> On installation, the tool would assume version "0" in the absence of a parcel
> file.
>
>     _include/com.example.foo-0/
>     _include/com.example.foo-0/MyProject.cfh
>
> Sound sane?

I think every parcel needs to have a .cfp file to tell which version of the 
Clownfish runtime it depends on. So the Clownfish compiler could simply walk 
the source directories, and as soon as it finds a .cfp file, it can make all 
files in that directory part of this parcel.

If we enforce that all files of a parcel live under a single directory, we can 
also make the following simplifications:

1. There's no need for `parcel` directives in .cfh files anymore. This would 
remove a bit of boilerplate.

2. .cfp files don't need a unique filename. They could be simply named 
`parcel.json`, for example.

Another consequence is that we either need another directory level under 
`core` for Lucy and Clownfish because of the test parcels:

     core/lucy/Lucy/Analysis
     core/test/Lucy/Test/TestAnalysis

Or we put the tests in a separate top-level directory:

     core/Lucy/Analysis
     test/Lucy/Test/TestAnalysis

Regarding the include directory layout, maybe we should make a change to my 
original proposal inspired by Ruby gems. Instead of putting a parcel's file in 
a directory named `parcel-version`, we could create single directory for all 
versions of a parcel with subdirectories for each version like:

     include/com.example.foo/1.0.0/parcel.json
     include/com.example.foo/2.0.0/parcel.json

So we wouldn't have to read the whole top-level directory to lookup parcels. 
Whether a parcel exists can be decided with a single `stat` call. To find all 
the versions of a parcel, only the entries in the parcel's subdirectory have 
to be consulted.

Nick

Re: [lucy-dev] Standalone documentation files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Nov 25, 2014 at 9:56 PM, Marvin Humphrey <ma...@rectangular.com> wrote:

> I suggest that the parcel dirs in "include" directories be required to follow
> the naming convention "PARCEL-VERSION".  That will make it possible to know
> what parcels are available without having to parse every .cfp file.
>
> For "source" directories in contrast, any top-level dir could contain any one
> arbitrary parcel.

As noted earlier: Requiring that the contents within "include" directories
adhere to such strict naming conventions is fine for installers but unfriendly
to development.

So instead, how about applying this ruleset to every top level directory in
either "source" or "include"?

1.  If we can detect that a given directory follows the naming convention
    PARCEL-VERSION, extract the parcel name from the dir name.
2.  Otherwise look for a .cfp file at the top level.
3.  Otherwise, parse every Clownfish file in the tree.  There should be only
    one parcel specified.

I think this would allow us to form a unified array of search paths consisting
of all "source" dirs followed by all "include" dirs.  (Is there any reason we
need to continue distinguising between the "source" and "include" for
search-path purposes?)

An important use case is a minimal single-parcel project, which should be
possible using only a single directory and two loose files:

  core/MyProject.cfh   # declares `parcel com.example.foo`
  core/MyProject.c

On installation, the tool would assume version "0" in the absence of a parcel
file.

   _include/com.example.foo-0/
   _include/com.example.foo-0/MyProject.cfh

Sound sane?

Marvin Humphrey

Re: [lucy-dev] Standalone documentation files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Nov 25, 2014 at 10:41 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> Yes, let's give it another try. My main goals for now are
>
> * Mechanism to find all .cfh files that belong to a prerequisite
>   parcel without parsing every file in the include directory tree.
> * Mechanism to find a file's parcel (.cfp file) given its path
>   (mainly in source directories).
>
> I still favor an approach where every parcel lives in a completely separate
> directory. This would make both of these goals easy to achieve.

Let's go with that plan.

I suggest that the parcel dirs in "include" directories be required to follow
the naming convention "PARCEL-VERSION".  That will make it possible to know
what parcels are available without having to parse every .cfp file.

For "source" directories in contrast, any top-level dir could contain any one
arbitrary parcel.

I suggest that we also consider special-casing unit tests somehow.  I think we
should avoid forcing them to belong to another parcel, denying them access
to internal symbols and excluding the possibility of white-box testing.

Marvin Humphrey

Re: [lucy-dev] Standalone documentation files

Posted by Nick Wellnhofer <we...@aevum.de>.

On 24/11/2014 21:08, Marvin Humphrey wrote:
> It should be possible to know a file's parcel given:
>
> *   where it lives in the file system
> *   the contents of a .cfp parcel file living above it
>
> For example:
>
>      # cookbook.md belongs to whatever parcel is spelled out in foo.cfp
>      foo/foo.cfp
>      foo/docs/cookbook.md

Deriving the parcel from a file's location is more of course elegant.

> We've discussed this before but haven't worked out the details.  Shall we
> proceed?

Yes, let's give it another try. My main goals for now are

* Mechanism to find all .cfh files that belong to a prerequisite
   parcel without parsing every file in the include directory tree.
* Mechanism to find a file's parcel (.cfp file) given its path
   (mainly in source directories).

I still favor an approach where every parcel lives in a completely separate 
directory. This would make both of these goals easy to achieve.

Nick

Re: [lucy-dev] Standalone documentation files

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Nov 24, 2014 at 8:49 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> It would be nice to support standalone documentation files that work across
> all host languages.

Clearly, what we want to support is standalone markdown files.  We just need
to figure out the details.

> The problem with a plain .md file is you can't tell which parcel the
> documentation file belongs to. This piece of information is needed to
> organize the C documentation and for the custom URI scheme. But it should be
> easy to add this metadata with a `@parcel Lucy` directive, similar to the
> `@param` syntax for function documentation.

It should be possible to know a file's parcel given:

*   where it lives in the file system
*   the contents of a .cfp parcel file living above it

For example:

    # cookbook.md belongs to whatever parcel is spelled out in foo.cfp
    foo/foo.cfp
    foo/docs/cookbook.md

We've discussed this before but haven't worked out the details.  Shall we
proceed?

> Another approach is to use some kind of "distribution" identifier that is
> unique across a single build and can be used for multiple parcels.

The Clownfish parcel is deliberately an atomic unit of installation --
and thus also versioning.

Other systems have botched this badly -- e.g. Perl/CPAN by tying versioning to
the package/namespace rather than the distro.  This makes it very hard to
reason about version requirements and the consequences of installing a
particular unit of distribution.

We should do everything we can to avoid repeating such mistakes.

Marvin Humphrey