You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2014/11/09 01:31:32 UTC

[lucy-dev] Markdown for documentation (redux)

Lucifers,

over two years ago, there's been some discussion about using Markdown for our 
documentation:

     http://s.apache.org/48q

There was general consensus that switching to Markdown would be nice but the 
incompatibilities and weak specifications of several competing Markdown 
implementations were found to be problematic. In the meantime, a Markdown 
flavor that addresses these problems has emerged with CommonMark:

     http://commonmark.org/

It's backed by established players in the industry and comes with a C library 
released under a permissive license, so it seems ideal for our needs. In my 
opinion, moving away from POD is crucial if we want to provide a documentation 
system that works for other host languages than Perl. If there aren't any 
objections, I'd be happy to work on switching our "DocuComments" over to 
CommonMark.

Nick

Re: [lucy-dev] Markdown for documentation (redux)

Posted by Nick Wellnhofer <we...@aevum.de>.
On 10/11/2014 18:27, David E. Wheeler wrote:
> Common Markdown is not. It is the center of quite a lot of controversy, mostly around Jeff Atwood not respecting Markdown creator John Gruber and trying to appropriate the name “Markdown” for himself. Some background:
>
>    http://shindoisshin.net/blog/2014/9/6/standard-markdown-controversy
>
> This flavor of markup might be ideal for code documentation, so may well be a great choice. I honestly don’t know. (I use tend to use MultiMarkdown, superset of Markdown). But you should be aware that, despite its name, it is manifestly *not* Markdown. It’s a Markdown-inspired markup language, yes, but not Markdown, and in fact violates some of the basic tenets of Markdown.

I'm aware of the controversy. That's how I actually found out about 
CommonMark. I also agree that the initial plan of naming it "Standard 
Markdown" was presumptuous. But aside from political things, I can't see how 
CommonMark is different from other Markdown flavors. Regarding the syntax and 
feature set, it's close enough to the original Markdown, similar to 
Github-flavored Markdown.

For Lucy or Clownfish, I'm mostly interested in a compact C-based parser with 
a compatible license that will be maintained for the foreseeable future. 
Unlike some other C implementations that have been discontinued, I'm pretty 
sure that CommonMark will stay around for quite some time. The focus on a 
strict specification is also a plus, although it doesn't matter that much for 
our needs. We only need a minimal subset of Markdown that maps nicely to the 
documentation format we want to support (for now, POD, HTML, and man pages 
(troff)). I'd rather prefer an implementation without unneeded features like 
HTML blocks.

Nick


Re: [lucy-dev] Markdown for documentation (redux)

Posted by "David E. Wheeler" <da...@justatheory.com>.
On Nov 8, 2014, at 4:31 PM, Nick Wellnhofer <we...@aevum.de> wrote:

>    http://commonmark.org/
> 
> It's backed by established players in the industry and comes with a C library released under a permissive license, so it seems ideal for our needs. In my opinion, moving away from POD is crucial if we want to provide a documentation system that works for other host languages than Perl. If there aren't any objections, I'd be happy to work on switching our "DocuComments" over to CommonMark.

Common Markdown is not. It is the center of quite a lot of controversy, mostly around Jeff Atwood not respecting Markdown creator John Gruber and trying to appropriate the name “Markdown” for himself. Some background:

  http://shindoisshin.net/blog/2014/9/6/standard-markdown-controversy

This flavor of markup might be ideal for code documentation, so may well be a great choice. I honestly don’t know. (I use tend to use MultiMarkdown, superset of Markdown). But you should be aware that, despite its name, it is manifestly *not* Markdown. It’s a Markdown-inspired markup language, yes, but not Markdown, and in fact violates some of the basic tenets of Markdown.

Best,

David

Re: [lucy-dev] Markdown for documentation (redux)

Posted by Nick Wellnhofer <we...@aevum.de>.
On 10/11/2014 05:45, Marvin Humphrey wrote:
> The past proposal was to augment Markdown with JavaDoc-style @tags and
> custom links.  But now that there's a standard, I think we should abandon the
> extensions so that we'll be able to say, simply, "Clownfish uses Markdown for
> documentation".
>
> Right now, we're using @param and @return tags a fair amount.  I volunteer to
> do the drudge work of turning those into ordinary Markdown.

Oh, I didn't want to get rid of the JavaDoc-style @tags. I really like the 
originally proposed mixture of JavaDoc and Markdown as it adds some additional 
structure to method documentation that can be used effectively when converting 
to formats like HTML.

Nick



Re: [lucy-dev] Markdown for documentation (redux)

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Nov 8, 2014 at 4:31 PM, Nick Wellnhofer <we...@aevum.de> wrote:

> If there aren't any objections, I'd be happy to work on switching our
> "DocuComments" over to CommonMark.

+1

The past proposal was to augment Markdown with JavaDoc-style @tags and
custom links.  But now that there's a standard, I think we should abandon the
extensions so that we'll be able to say, simply, "Clownfish uses Markdown for
documentation".

Right now, we're using @param and @return tags a fair amount.  I volunteer to
do the drudge work of turning those into ordinary Markdown.

Marvin Humphrey

Re: [lucy-dev] Markdown for documentation (redux)

Posted by Nick Wellnhofer <we...@aevum.de>.
On 11/12/2014 02:29, Marvin Humphrey wrote:
> IIRC `Err_error` was the original location but its functionality was moved
> into a Perl scalar so that the variable would be thread-local under Perl
> ithreads.
>
> I think we should avoid symbolic replacements which include an article like
> "the" -- too inflexible, too much mental gymnastics to use in context.
> Instead, I suggest we should map @error to a proper name, but keep it hidden
> -- i.e. "current_error" in the C bindings would be renamed to "Err_error" but
> would retain its `static` qualifier.
>
> I agree with using the pseudo-symbol `[](cfish:@error)`, because we will be
> special-casing the translation.  Prepending `@` seems like a decent choice.

Maybe we should simply reword the documentation from

     ...sets Err_error when...

to

     ...sets the global [](cfish:cfish.Err) object when...

or

     ...sets the global [](cfish:cfish.Err) object returned by
     [](cfish:cfish.Err.get_error) when...

and use "Clownfish->error" instead of "get_error" for the Perl documentation. 
So we wouldn't need the `@error`symbol at all.

Nick


Re: [lucy-dev] Markdown for documentation (redux)

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Wed, Dec 10, 2014 at 9:04 AM, Nick Wellnhofer <we...@aevum.de> wrote:

> All of this is now implemented.

\o/
        \o/
    \o/

> The only remaining issue is how to treat the global error variable.
> Currently, we replace the text "Err_error" with "Clownfish->error" in the
> Perl POD. But Err_error isn't a valid symbol (maybe it was at some point?),
> so we should use something else for the C documentation. It should work to
> use a pseudo-link like `[](cfish:@error)` with the following replacements:
>
>     C: "[the global error](cfish:cfish.Err.get_error)"
>     Perl: "Clownfish->error"
>
> (Typical usage in the documentation is something like "...sets Err_error
> when...").

IIRC `Err_error` was the original location but its functionality was moved
into a Perl scalar so that the variable would be thread-local under Perl
ithreads.

I think we should avoid symbolic replacements which include an article like
"the" -- too inflexible, too much mental gymnastics to use in context.
Instead, I suggest we should map @error to a proper name, but keep it hidden
-- i.e. "current_error" in the C bindings would be renamed to "Err_error" but
would retain its `static` qualifier.

I agree with using the pseudo-symbol `[](cfish:@error)`, because we will be
special-casing the translation.  Prepending `@` seems like a decent choice.

Marvin Humphrey

Re: [lucy-dev] Markdown for documentation (redux)

Posted by Nick Wellnhofer <we...@aevum.de>.
On 03/12/2014 15:06, Nick Wellnhofer wrote:
> On 03/12/2014 05:16, Marvin Humphrey wrote:
>> On Tue, Dec 2, 2014 at 10:13 AM, Nick Wellnhofer <we...@aevum.de> wrote:
>>> The Clownfish URI scheme now supports the following type of links:
>>>
>>>      clownfish:class:{parcel}:{struct_sym}
>>>      clownfish:class:{struct_sym}
>>>      clownfish:method:{parcel}:{struct_sym}:{macro_sym}
>>>      clownfish:method:{struct_sym}:{macro_sym}
>>>      clownfish:method:{macro_sym}
>>>      clownfish:function:{parcel}:{struct_sym}:{micro_sym}
>>>      clownfish:function:{struct_sym}:{micro_sym}
>>>      clownfish:function:{micro_sym}
>>
>> Suggestions:
>>
>> *   Use dot separation.
>
> OK.
>
>> *   Empty brackets imply that we should insert a host-appropriate alias.
>>
>>      [Lucy](cfish:org.apache.lucy)             # parcel
>>      [](cfish:org.apache.lucy.Query)           # class
>>      [](cfish:org.apache.lucy.Hits.Next)       # method
>>      [](cfish:org.apache.lucy.Freezer.freeze)  # function
>>      [](cfish:null)
>
> My original plan was to always use the host alias if it's different from the
> Clownfish name. This is already implemented for Perl method names. But we can
> change that to replace the link text only if it's empty.
>
>>> If the `parcel` or `struct_sym` components are missing, the values of the
>>> current class are used. This allows for shorter URIs.
>>
>> I think you can achieve the same functionality with only a leading dot to
>> differentiate methods and functions from classes and parcels.
>>
>>      [](cfish:Query)           # class in the current parcel
>>      [](cfish:.Next)           # method in the current parcel and class
>>      [](cfish:.freeze)         # function in the current parcel and class
>>      [](cfish:Hits.Next)       # method, same parcel different class
>>      [](cfish:Freezer.freeze)  # function, same parcel different class
>
> That should work. But I'd like to keep the URI syntax extensible. This could
> be achieved by using other symbols:
>
>      cfish:@null
>      cfish:$null
>      cfish:#null
>
>>> I'm also thinking
>>> about using `cfish` instead of `clownfish` as URI scheme.
>>
>> Sure, that works.  Protocols generally have short names.
>
> OK.

All of this is now implemented.

The only remaining issue is how to treat the global error variable. Currently, 
we replace the text "Err_error" with "Clownfish->error" in the Perl POD. But 
Err_error isn't a valid symbol (maybe it was at some point?), so we should use 
something else for the C documentation. It should work to use a pseudo-link 
like `[](cfish:@error)` with the following replacements:

     C: "[the global error](cfish:cfish.Err.get_error)"
     Perl: "Clownfish->error"

(Typical usage in the documentation is something like "...sets Err_error 
when...").

Nick

Re: [lucy-dev] Markdown for documentation (redux)

Posted by Nick Wellnhofer <we...@aevum.de>.
On 03/12/2014 05:16, Marvin Humphrey wrote:
> On Tue, Dec 2, 2014 at 10:13 AM, Nick Wellnhofer <we...@aevum.de> wrote:
>> The Clownfish URI scheme now supports the following type of links:
>>
>>      clownfish:class:{parcel}:{struct_sym}
>>      clownfish:class:{struct_sym}
>>      clownfish:method:{parcel}:{struct_sym}:{macro_sym}
>>      clownfish:method:{struct_sym}:{macro_sym}
>>      clownfish:method:{macro_sym}
>>      clownfish:function:{parcel}:{struct_sym}:{micro_sym}
>>      clownfish:function:{struct_sym}:{micro_sym}
>>      clownfish:function:{micro_sym}
>
> Suggestions:
>
> *   Use dot separation.

OK.

> *   Empty brackets imply that we should insert a host-appropriate alias.
>
>      [Lucy](cfish:org.apache.lucy)             # parcel
>      [](cfish:org.apache.lucy.Query)           # class
>      [](cfish:org.apache.lucy.Hits.Next)       # method
>      [](cfish:org.apache.lucy.Freezer.freeze)  # function
>      [](cfish:null)

My original plan was to always use the host alias if it's different from the 
Clownfish name. This is already implemented for Perl method names. But we can 
change that to replace the link text only if it's empty.

>> If the `parcel` or `struct_sym` components are missing, the values of the
>> current class are used. This allows for shorter URIs.
>
> I think you can achieve the same functionality with only a leading dot to
> differentiate methods and functions from classes and parcels.
>
>      [](cfish:Query)           # class in the current parcel
>      [](cfish:.Next)           # method in the current parcel and class
>      [](cfish:.freeze)         # function in the current parcel and class
>      [](cfish:Hits.Next)       # method, same parcel different class
>      [](cfish:Freezer.freeze)  # function, same parcel different class

That should work. But I'd like to keep the URI syntax extensible. This could 
be achieved by using other symbols:

     cfish:@null
     cfish:$null
     cfish:#null

>> I'm also thinking
>> about using `cfish` instead of `clownfish` as URI scheme.
>
> Sure, that works.  Protocols generally have short names.

OK.

>> I'm also working on a `clownfish:null` pseudo-URI that can be used to fill
>> in the host language's name for an undefined value. Then all of the old
>> `perlify_pod` hacks can be eliminated.
>
> Well, there's still the issue of method name aliasing to suit the conventions
> of the host language.  For Perl, `Method_Name()` gets downcased to
> `method_name()`;  for Go, it would be `MethodName()`, for Java,
> JavaScript and the
> like it would be `methodName()`, etc.

Perl method names are already converted unconditionally in the current 
implementation.

Nick



Re: [lucy-dev] Markdown for documentation (redux)

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Dec 2, 2014 at 10:13 AM, Nick Wellnhofer <we...@aevum.de> wrote:

> I just landed a new branch `markdown_v2` with many improvements including
> better HTML output and links to methods and functions using fragment
> identifiers.

Having prefixes greyed is a nice touch!

> Unfortunately, we can't easily link to specific methods in our
> Perl POD because information about method parameters appears in the section
> headers.

Certainly that's something we could change.

> The Clownfish URI scheme now supports the following type of links:
>
>     clownfish:class:{parcel}:{struct_sym}
>     clownfish:class:{struct_sym}
>     clownfish:method:{parcel}:{struct_sym}:{macro_sym}
>     clownfish:method:{struct_sym}:{macro_sym}
>     clownfish:method:{macro_sym}
>     clownfish:function:{parcel}:{struct_sym}:{micro_sym}
>     clownfish:function:{struct_sym}:{micro_sym}
>     clownfish:function:{micro_sym}

Suggestions:

*   Use dot separation.
*   We should be able to glean enough information from our naming conventions
    to eliminate "class", "method", etc.
*   Empty brackets imply that we should insert a host-appropriate alias.

    [Lucy](cfish:org.apache.lucy)             # parcel
    [](cfish:org.apache.lucy.Query)           # class
    [](cfish:org.apache.lucy.Hits.Next)       # method
    [](cfish:org.apache.lucy.Freezer.freeze)  # function
    [](cfish:null)

> If the `parcel` or `struct_sym` components are missing, the values of the
> current class are used. This allows for shorter URIs.

I think you can achieve the same functionality with only a leading dot to
differentiate methods and functions from classes and parcels.

    [](cfish:Query)           # class in the current parcel
    [](cfish:.Next)           # method in the current parcel and class
    [](cfish:.freeze)         # function in the current parcel and class
    [](cfish:Hits.Next)       # method, same parcel different class
    [](cfish:Freezer.freeze)  # function, same parcel different class

> I'm also thinking
> about using `cfish` instead of `clownfish` as URI scheme.

Sure, that works.  Protocols generally have short names.

> I'm also working on a `clownfish:null` pseudo-URI that can be used to fill
> in the host language's name for an undefined value. Then all of the old
> `perlify_pod` hacks can be eliminated.

Well, there's still the issue of method name aliasing to suit the conventions
of the host language.  For Perl, `Method_Name()` gets downcased to
`method_name()`;  for Go, it would be `MethodName()`, for Java,
JavaScript and the
like it would be `methodName()`, etc.

(Method name aliasing is also an unsolved issue in certain other contexts,
such as C string literal error messages.)

Marvin Humphrey

Re: [lucy-dev] Markdown for documentation (redux)

Posted by Nick Wellnhofer <we...@aevum.de>.
On 23/11/2014 19:05, Nick Wellnhofer wrote:
> I made the necessary changes for both Lucy and Clownfish in the `markdown`
> branches. Man pages and HTML documentation for the C bindings, as well as Perl
> POD are now autogenerated from Markdown DocuComments.
>
> I also uploaded a sample of the new HTML documentation for the C APIs:
>
> http://lucy.apache.org/docs/c/
> http://lucy.apache.org/docs/c/cfish.html
> http://lucy.apache.org/docs/c/lucy.html

I just landed a new branch `markdown_v2` with many improvements including 
better HTML output and links to methods and functions using fragment 
identifiers. Unfortunately, we can't easily link to specific methods in our 
Perl POD because information about method parameters appears in the section 
headers.

Example page: http://lucy.apache.org/docs/c/lucy_Query.html

The Clownfish URI scheme now supports the following type of links:

     clownfish:class:{parcel}:{struct_sym}
     clownfish:class:{struct_sym}
     clownfish:method:{parcel}:{struct_sym}:{macro_sym}
     clownfish:method:{struct_sym}:{macro_sym}
     clownfish:method:{macro_sym}
     clownfish:function:{parcel}:{struct_sym}:{micro_sym}
     clownfish:function:{struct_sym}:{micro_sym}
     clownfish:function:{micro_sym}

If the `parcel` or `struct_sym` components are missing, the values of the 
current class are used. This allows for shorter URIs. I'm also thinking about 
using `cfish` instead of `clownfish` as URI scheme.

I'm also working on a `clownfish:null` pseudo-URI that can be used to fill in 
the host language's name for an undefined value. Then all of the old 
`perlify_pod` hacks can be eliminated.

I plan to merge this branch in the next days.

Nick


Re: [lucy-dev] Markdown for documentation (redux)

Posted by Nick Wellnhofer <we...@aevum.de>.
On 01/12/2014 07:08, Marvin Humphrey wrote:
> The W3C validator reports a few problems with some of the generated HTML -- it
> might be nice to clean those up.

One thing is a missing <title>. I originally planned to allow custom HTML 
headers and footers for the generated docs but this would require a simple 
templating system to create useful titles. I think I'll defer this feature for 
now.

The other is a missing charset declaration. This raises the question whether 
we want to allow UTF-8 in our docucomments. This isn't a problem for HTML and 
POD. For man pages, UTF-8 support is system-dependent. My guess is that most 
modern systems support UTF-8. If this turns out to be a problem, we can also 
escape non-ASCII characters.

> To be honest, I'm not thrilled with what we're putting out there as the public
> API for either Clownfish or Lucy -- there are a number of things marked
> "public" which really shouldn't be.  The subset of the API published in the
> Perl docs is what I would consider canonical.

I only uploaded the HTML documentation for review. I also have no plans to 
publish C API documentation for Lucy. But I really want to get started with 
defining and publishing the Clownfish C API.

> Question: does CommonMark support conditional inclusion of code blocks?
> It would be nice to put things such as the Lucy tutorial into core, but
> providing host-specific code samples poses a challenge.

CommonMark supports the "fenced" code blocks of GitHub Flavored Markdown that 
allow to add an "info string":

     ```ruby
     require 'redcarpet'
     markdown = Redcarpet.new("Hello World!")
     puts markdown.to_html
     ```

It also allows to use tildes to be used instead of backticks:

     http://spec.commonmark.org/0.12/#fenced-code-blocks

The HTML renderer uses the first word of the info string to add a class like 
"language-ruby". The info string is also available in the parsed syntax tree, 
so it's easy to remove the code blocks we're not interested in.

Nick


Re: [lucy-dev] Markdown for documentation (redux)

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Nov 23, 2014 at 10:05 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> I made the necessary changes for both Lucy and Clownfish in the `markdown`
> branches. Man pages and HTML documentation for the C bindings, as well as
> Perl POD are now autogenerated from Markdown DocuComments.

It's really cool to see this implemented, Nick!

> I also uploaded a sample of the new HTML documentation for the C APIs:
>
> http://lucy.apache.org/docs/c/
> http://lucy.apache.org/docs/c/cfish.html
> http://lucy.apache.org/docs/c/lucy.html

The HTML looks spiffy!

The W3C validator reports a few problems with some of the generated HTML -- it
might be nice to clean those up.

> C API docs are only generated for public classes. Since not all classes are
> marked public yet, some pages are missing and some links are dead.

To be honest, I'm not thrilled with what we're putting out there as the public
API for either Clownfish or Lucy -- there are a number of things marked
"public" which really shouldn't be.  The subset of the API published in the
Perl docs is what I would consider canonical.

However, I'm not going to harp on this issue until after I've made substantial
progress on the higher priority of adding host languages.

> The Markdown documentation supports links to classes with a custom
> `clownfish` URI scheme. A link like
>
>     [RegexTokenizer](clownfish:class:lucy:RegexTokenizer)
>
> will be automatically converted to point to right location depending on the
> host language. This can also be (ab)used for other language-dependent stuff.

I think this is a reasonable extension.

Question: does CommonMark support conditional inclusion of code blocks?
It would be nice to put things such as the Lucy tutorial into core, but
providing host-specific code samples poses a challenge.

Marvin Humphrey

Re: [lucy-dev] Markdown for documentation (redux)

Posted by Nick Wellnhofer <we...@aevum.de>.
On 09/11/2014 01:31, Nick Wellnhofer wrote:
> If there aren't any
> objections, I'd be happy to work on switching our "DocuComments" over to
> CommonMark.

I made the necessary changes for both Lucy and Clownfish in the `markdown` 
branches. Man pages and HTML documentation for the C bindings, as well as Perl 
POD are now autogenerated from Markdown DocuComments.

I also uploaded a sample of the new HTML documentation for the C APIs:

http://lucy.apache.org/docs/c/
http://lucy.apache.org/docs/c/cfish.html
http://lucy.apache.org/docs/c/lucy.html

C API docs are only generated for public classes. Since not all classes are 
marked public yet, some pages are missing and some links are dead.

The Markdown documentation supports links to classes with a custom `clownfish` 
URI scheme. A link like

     [RegexTokenizer](clownfish:class:lucy:RegexTokenizer)

will be automatically converted to point to right location depending on the 
host language. This can also be (ab)used for other language-dependent stuff.

Nick


Re: [lucy-dev] Markdown for documentation (redux)

Posted by Logan Bell <lo...@gmail.com>.
+1

On Sun, Nov 9, 2014 at 11:48 PM, Peter Karman <pe...@peknet.com> wrote:
> On 11/8/14, 6:31 PM, Nick Wellnhofer wrote:
>
>>
>>     http://commonmark.org/
>>
>> It's backed by established players in the industry and comes with a C
>> library released under a permissive license, so it seems ideal for our
>> needs. In my opinion, moving away from POD is crucial if we want to
>> provide a documentation system that works for other host languages than
>> Perl. If there aren't any objections, I'd be happy to work on switching
>> our "DocuComments" over to CommonMark.
>>
>
>
> +1
>
>
> --
> Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [lucy-dev] Markdown for documentation (redux)

Posted by Peter Karman <pe...@peknet.com>.
On 11/8/14, 6:31 PM, Nick Wellnhofer wrote:

> 
>     http://commonmark.org/
> 
> It's backed by established players in the industry and comes with a C
> library released under a permissive license, so it seems ideal for our
> needs. In my opinion, moving away from POD is crucial if we want to
> provide a documentation system that works for other host languages than
> Perl. If there aren't any objections, I'd be happy to work on switching
> our "DocuComments" over to CommonMark.
> 


+1


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com