You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Antoine Pitrou <an...@python.org> on 2018/05/12 16:03:38 UTC

Language-independent and cross-language docs

Hi,

In the following PR discussion it was mentioned that we currently lack a
central documentation system for cross-language topics:
https://github.com/apache/arrow/pull/1575#issuecomment-364062240

Sphinx looks like a reasonable contender for that purpose.  For that who
don't know it, Sphinx is a documentation system initially developed for
the Python language, which quickly became widely-used amongst Python
projects, and is now being used by non-Python projects as well.  For
example, the LLVM docs (https://llvm.org/docs/) and even the Linux
kernel online docs are now written using Sphinx
(https://www.kernel.org/doc/html/latest/index.html).

Sphinx uses reStructuredText (a.k.a "reST") as its basic markup
language, but with many extensions.  It allows for structured
documentation with extensive cross-referencing (even between independent
Sphinx sites, using the "intersphinx" extension).

The questions here are:

- Should we do this at all (i.e. build up a central documentation system)?

- Should we use Sphinx for it?

- To which extent our current docs should be migrated to Sphinx (apart
from the Python docs, which already use Sphinx)?  For example, should
the specs (currently standalone pages written in Markdown) be migrated
to Sphinx for better cross-referencing and navigation?  What about the
C++ tutorial pages?  etc.

- Should we preferably have a single Sphinx doctree, or several
independent per-topic / per-language doctrees?

Regards

Antoine.

Re: Language-independent and cross-language docs

Posted by Wes McKinney <we...@gmail.com>.
+1 on setting up a top-level documentation project. I think that
establishing an information hierarchy to help people understand all
the layers of the project is more important than the choice of the
documentation tool -- for example, if we started with Sphinx and
decided to move later to something else, there are tools to exist with
converting between markup languages (though it would require some
manual fixes).

I'm sort of neutral on combining the current language-specific
documentation projects into a monolithic documentation project. My
prior for this would be that the top-level documentation should
consist of:

* High level overview of the Arrow project: components, languages, and vision
* Columnar specification documents (migrating the current Markdown
documents in format/) and other specification documents
* High level project roadmap and contributor guide
* Guides for maintainers / committers
* Getting started guide for each language

The top-level documentation could direct users to the
language-specific API and usage docs (i.e. like the current Python
Sphinx project)

I'm interested what people think about how to integrate this
statically-generated content with our current Jekyll-based website.
One could argue that all this top-level documentation could be handled
by Jekyll (or equivalent static site generator)

- Wes

On Thu, May 17, 2018 at 3:44 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
> Hello,
>
> I can second that we should move the documentation to a central one. As a C++ and Python contributor at the same time it always hard to think of where you should document a specific piece. We have a very small C++ documentation and a bit larger Python one. For some features it would though make sense to have them in both. IPC and in-process sharing is also a main part of the Arrow project. Documenting this separately for each language will be a lot of work and probably leave blind spots in each language.
>
> Not everything in each language ecosystem can be directly included in Sphinx but as Sphinx is becoming a very broadly used documentation system, there are many nice converters like Breeze [1] (Doxygen to Sphinx) available.
>
> To directly answer the questions:
>
> - Should we do this at all (i.e. build up a central documentation system)?
>
> Yes
>
> - Should we use Sphinx for it?
>
> Very much in favour. There is probably also a tendency that some people prefer Markdown (I do) but given the feature set of Sphinx, I would very much argue in favour of it.
>
>  - To which extent our current docs should be migrated to Sphinx (apart
>  from the Python docs, which already use Sphinx)?  For example, should
>  the specs (currently standalone pages written in Markdown) be migrated
>  to Sphinx for better cross-referencing and navigation?  What about the
>  C++ tutorial pages?  etc.
>
> I would migrate C++ documentation definitely fully into that but the C++ / Python relation is very tight. There are a lot of topics that either touch two languages or are general to the project, these should also go in there.
>
> - Should we preferably have a single Sphinx doctree, or several
>  independent per-topic / per-language doctrees?
>
> I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will have many shared topics between the different implemenations so I would expect that we should have a single documentation with well organized sections.
>
> Also we probably will face the issue we have documentation on a specific topic and only a small part is different between two implementations/setups/... I really like the Scala/Python tabs in the Spark docs [2]. There is a Sphinx extension that seems to something similar to this [3]. This could either be used to have documentation on how to construct things where one switches between Ruby and Python or the main issue where I would need it: Setting up the build with slightly different package managers (e.g. conda vs pip in Python).
>
> Uwe
>
> [1]: https://breathe.readthedocs.io/en/latest/
> [2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
> [3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html
>
>
> On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote:
>>
>> Hi,
>>
>> In the following PR discussion it was mentioned that we currently lack a
>> central documentation system for cross-language topics:
>> https://github.com/apache/arrow/pull/1575#issuecomment-364062240
>>
>> Sphinx looks like a reasonable contender for that purpose.  For that who
>> don't know it, Sphinx is a documentation system initially developed for
>> the Python language, which quickly became widely-used amongst Python
>> projects, and is now being used by non-Python projects as well.  For
>> example, the LLVM docs (https://llvm.org/docs/) and even the Linux
>> kernel online docs are now written using Sphinx
>> (https://www.kernel.org/doc/html/latest/index.html).
>>
>> Sphinx uses reStructuredText (a.k.a "reST") as its basic markup
>> language, but with many extensions.  It allows for structured
>> documentation with extensive cross-referencing (even between independent
>> Sphinx sites, using the "intersphinx" extension).
>>
>> The questions here are:
>>
>> - Should we do this at all (i.e. build up a central documentation system)?
>>
>> - Should we use Sphinx for it?
>>
>> - To which extent our current docs should be migrated to Sphinx (apart
>> from the Python docs, which already use Sphinx)?  For example, should
>> the specs (currently standalone pages written in Markdown) be migrated
>> to Sphinx for better cross-referencing and navigation?  What about the
>> C++ tutorial pages?  etc.
>>
>> - Should we preferably have a single Sphinx doctree, or several
>> independent per-topic / per-language doctrees?
>>
>> Regards
>>
>> Antoine.

Re: Language-independent and cross-language docs

Posted by Wes McKinney <we...@gmail.com>.
Among other things, the columnar format specification files should
probably make their way into this new documentation project.

On Mon, May 21, 2018 at 5:19 PM, Wes McKinney <we...@gmail.com> wrote:
> I don't think we should attempt to create a documentation "super
> project" that includes the generated API reference for all the
> libraries in Apache Arrow. I do think that creating a documentation
> "hub" project (with the low-level API docs being the "spokes") is a
> good idea. Currently, the Jekyll project website serves as a very
> crude hub. It would be better to build something more suited for
> writing developer documentation.
>
> So in other words, the subprojects would continue to generate API docs
> using the current tools (Javadoc, GTK-Doc, Doxygen, Sphinx, etc.) but
> the objective of the "top level docs" is to make the entire project
> easier to navigate than it is now.
>
> On Sun, May 20, 2018 at 3:15 AM, Kouhei Sutou <ko...@clear-code.com> wrote:
>> Hi,
>>
>>> I really like the Scala/Python tabs in the Spark docs [2].
>>
>>> [2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
>>
>> Oh, I also like it.
>>
>>> > - Should we do this at all (i.e. build up a central documentation system)?
>>
>> Yes.
>>
>>> > - Should we use Sphinx for it?
>>
>> I'm neutral.
>>
>> If we choose Sphinx or something, we need some works for
>> Apache Arrow C. It uses GTK-Doc as its documentation
>> system. We'll need to create a tool like
>> https://github.com/pygobject/pgi-docgen . (It's a tool for
>> Sphinx.)
>>
>> Apache Arrow C needs to keep using GTK-Doc style for API
>> documentation. Because it's also used by GObject
>> Introspection. GObject Introspection is very important in
>> Apache Arrow C. For example, The Ruby bindings needs GObject
>> Introspection support. So we shouldn't drop GObject
>> Introspection support.
>>
>> Other documentations such as tutorial (they doesn't exist
>> yet :<) don't need to use GTK-Doc style.
>>
>>
>> We'll need to create a similar tool for Apache Arrow Ruby.
>> The most API of Apache Arrow Ruby are generated
>> automatically by GObject Introspection support. We can reuse
>> GTK-Doc style documentation in Apache Arrow C for Apache
>> Arrow Ruby.
>>
>> We may be able to use
>> https://github.com/ruby-gnome2/yard-gobject-introspection
>> for Apache Arrow Ruby. It's not completed yet but we can
>> improve it. (I'm one of the developers of it.)
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In <15...@webmail.messagingengine.com>
>>   "Re: Language-independent and cross-language docs" on Thu, 17 May 2018 21:44:33 +0200,
>>   "Uwe L. Korn" <uw...@xhochy.com> wrote:
>>
>>> Hello,
>>>
>>> I can second that we should move the documentation to a central one. As a C++ and Python contributor at the same time it always hard to think of where you should document a specific piece. We have a very small C++ documentation and a bit larger Python one. For some features it would though make sense to have them in both. IPC and in-process sharing is also a main part of the Arrow project. Documenting this separately for each language will be a lot of work and probably leave blind spots in each language.
>>>
>>> Not everything in each language ecosystem can be directly included in Sphinx but as Sphinx is becoming a very broadly used documentation system, there are many nice converters like Breeze [1] (Doxygen to Sphinx) available.
>>>
>>> To directly answer the questions:
>>>
>>> - Should we do this at all (i.e. build up a central documentation system)?
>>>
>>> Yes
>>>
>>> - Should we use Sphinx for it?
>>>
>>> Very much in favour. There is probably also a tendency that some people prefer Markdown (I do) but given the feature set of Sphinx, I would very much argue in favour of it.
>>>
>>>  - To which extent our current docs should be migrated to Sphinx (apart
>>>  from the Python docs, which already use Sphinx)?  For example, should
>>>  the specs (currently standalone pages written in Markdown) be migrated
>>>  to Sphinx for better cross-referencing and navigation?  What about the
>>>  C++ tutorial pages?  etc.
>>>
>>> I would migrate C++ documentation definitely fully into that but the C++ / Python relation is very tight. There are a lot of topics that either touch two languages or are general to the project, these should also go in there.
>>>
>>> - Should we preferably have a single Sphinx doctree, or several
>>>  independent per-topic / per-language doctrees?
>>>
>>> I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will have many shared topics between the different implemenations so I would expect that we should have a single documentation with well organized sections.
>>>
>>> Also we probably will face the issue we have documentation on a specific topic and only a small part is different between two implementations/setups/... I really like the Scala/Python tabs in the Spark docs [2]. There is a Sphinx extension that seems to something similar to this [3]. This could either be used to have documentation on how to construct things where one switches between Ruby and Python or the main issue where I would need it: Setting up the build with slightly different package managers (e.g. conda vs pip in Python).
>>>
>>> Uwe
>>>
>>> [1]: https://breathe.readthedocs.io/en/latest/
>>> [2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
>>> [3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html
>>>
>>>
>>> On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote:
>>>>
>>>> Hi,
>>>>
>>>> In the following PR discussion it was mentioned that we currently lack a
>>>> central documentation system for cross-language topics:
>>>> https://github.com/apache/arrow/pull/1575#issuecomment-364062240
>>>>
>>>> Sphinx looks like a reasonable contender for that purpose.  For that who
>>>> don't know it, Sphinx is a documentation system initially developed for
>>>> the Python language, which quickly became widely-used amongst Python
>>>> projects, and is now being used by non-Python projects as well.  For
>>>> example, the LLVM docs (https://llvm.org/docs/) and even the Linux
>>>> kernel online docs are now written using Sphinx
>>>> (https://www.kernel.org/doc/html/latest/index.html).
>>>>
>>>> Sphinx uses reStructuredText (a.k.a "reST") as its basic markup
>>>> language, but with many extensions.  It allows for structured
>>>> documentation with extensive cross-referencing (even between independent
>>>> Sphinx sites, using the "intersphinx" extension).
>>>>
>>>> The questions here are:
>>>>
>>>> - Should we do this at all (i.e. build up a central documentation system)?
>>>>
>>>> - Should we use Sphinx for it?
>>>>
>>>> - To which extent our current docs should be migrated to Sphinx (apart
>>>> from the Python docs, which already use Sphinx)?  For example, should
>>>> the specs (currently standalone pages written in Markdown) be migrated
>>>> to Sphinx for better cross-referencing and navigation?  What about the
>>>> C++ tutorial pages?  etc.
>>>>
>>>> - Should we preferably have a single Sphinx doctree, or several
>>>> independent per-topic / per-language doctrees?
>>>>
>>>> Regards
>>>>
>>>> Antoine.

Re: Language-independent and cross-language docs

Posted by Wes McKinney <we...@gmail.com>.
I don't think we should attempt to create a documentation "super
project" that includes the generated API reference for all the
libraries in Apache Arrow. I do think that creating a documentation
"hub" project (with the low-level API docs being the "spokes") is a
good idea. Currently, the Jekyll project website serves as a very
crude hub. It would be better to build something more suited for
writing developer documentation.

So in other words, the subprojects would continue to generate API docs
using the current tools (Javadoc, GTK-Doc, Doxygen, Sphinx, etc.) but
the objective of the "top level docs" is to make the entire project
easier to navigate than it is now.

On Sun, May 20, 2018 at 3:15 AM, Kouhei Sutou <ko...@clear-code.com> wrote:
> Hi,
>
>> I really like the Scala/Python tabs in the Spark docs [2].
>
>> [2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
>
> Oh, I also like it.
>
>> > - Should we do this at all (i.e. build up a central documentation system)?
>
> Yes.
>
>> > - Should we use Sphinx for it?
>
> I'm neutral.
>
> If we choose Sphinx or something, we need some works for
> Apache Arrow C. It uses GTK-Doc as its documentation
> system. We'll need to create a tool like
> https://github.com/pygobject/pgi-docgen . (It's a tool for
> Sphinx.)
>
> Apache Arrow C needs to keep using GTK-Doc style for API
> documentation. Because it's also used by GObject
> Introspection. GObject Introspection is very important in
> Apache Arrow C. For example, The Ruby bindings needs GObject
> Introspection support. So we shouldn't drop GObject
> Introspection support.
>
> Other documentations such as tutorial (they doesn't exist
> yet :<) don't need to use GTK-Doc style.
>
>
> We'll need to create a similar tool for Apache Arrow Ruby.
> The most API of Apache Arrow Ruby are generated
> automatically by GObject Introspection support. We can reuse
> GTK-Doc style documentation in Apache Arrow C for Apache
> Arrow Ruby.
>
> We may be able to use
> https://github.com/ruby-gnome2/yard-gobject-introspection
> for Apache Arrow Ruby. It's not completed yet but we can
> improve it. (I'm one of the developers of it.)
>
>
> Thanks,
> --
> kou
>
> In <15...@webmail.messagingengine.com>
>   "Re: Language-independent and cross-language docs" on Thu, 17 May 2018 21:44:33 +0200,
>   "Uwe L. Korn" <uw...@xhochy.com> wrote:
>
>> Hello,
>>
>> I can second that we should move the documentation to a central one. As a C++ and Python contributor at the same time it always hard to think of where you should document a specific piece. We have a very small C++ documentation and a bit larger Python one. For some features it would though make sense to have them in both. IPC and in-process sharing is also a main part of the Arrow project. Documenting this separately for each language will be a lot of work and probably leave blind spots in each language.
>>
>> Not everything in each language ecosystem can be directly included in Sphinx but as Sphinx is becoming a very broadly used documentation system, there are many nice converters like Breeze [1] (Doxygen to Sphinx) available.
>>
>> To directly answer the questions:
>>
>> - Should we do this at all (i.e. build up a central documentation system)?
>>
>> Yes
>>
>> - Should we use Sphinx for it?
>>
>> Very much in favour. There is probably also a tendency that some people prefer Markdown (I do) but given the feature set of Sphinx, I would very much argue in favour of it.
>>
>>  - To which extent our current docs should be migrated to Sphinx (apart
>>  from the Python docs, which already use Sphinx)?  For example, should
>>  the specs (currently standalone pages written in Markdown) be migrated
>>  to Sphinx for better cross-referencing and navigation?  What about the
>>  C++ tutorial pages?  etc.
>>
>> I would migrate C++ documentation definitely fully into that but the C++ / Python relation is very tight. There are a lot of topics that either touch two languages or are general to the project, these should also go in there.
>>
>> - Should we preferably have a single Sphinx doctree, or several
>>  independent per-topic / per-language doctrees?
>>
>> I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will have many shared topics between the different implemenations so I would expect that we should have a single documentation with well organized sections.
>>
>> Also we probably will face the issue we have documentation on a specific topic and only a small part is different between two implementations/setups/... I really like the Scala/Python tabs in the Spark docs [2]. There is a Sphinx extension that seems to something similar to this [3]. This could either be used to have documentation on how to construct things where one switches between Ruby and Python or the main issue where I would need it: Setting up the build with slightly different package managers (e.g. conda vs pip in Python).
>>
>> Uwe
>>
>> [1]: https://breathe.readthedocs.io/en/latest/
>> [2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
>> [3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html
>>
>>
>> On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote:
>>>
>>> Hi,
>>>
>>> In the following PR discussion it was mentioned that we currently lack a
>>> central documentation system for cross-language topics:
>>> https://github.com/apache/arrow/pull/1575#issuecomment-364062240
>>>
>>> Sphinx looks like a reasonable contender for that purpose.  For that who
>>> don't know it, Sphinx is a documentation system initially developed for
>>> the Python language, which quickly became widely-used amongst Python
>>> projects, and is now being used by non-Python projects as well.  For
>>> example, the LLVM docs (https://llvm.org/docs/) and even the Linux
>>> kernel online docs are now written using Sphinx
>>> (https://www.kernel.org/doc/html/latest/index.html).
>>>
>>> Sphinx uses reStructuredText (a.k.a "reST") as its basic markup
>>> language, but with many extensions.  It allows for structured
>>> documentation with extensive cross-referencing (even between independent
>>> Sphinx sites, using the "intersphinx" extension).
>>>
>>> The questions here are:
>>>
>>> - Should we do this at all (i.e. build up a central documentation system)?
>>>
>>> - Should we use Sphinx for it?
>>>
>>> - To which extent our current docs should be migrated to Sphinx (apart
>>> from the Python docs, which already use Sphinx)?  For example, should
>>> the specs (currently standalone pages written in Markdown) be migrated
>>> to Sphinx for better cross-referencing and navigation?  What about the
>>> C++ tutorial pages?  etc.
>>>
>>> - Should we preferably have a single Sphinx doctree, or several
>>> independent per-topic / per-language doctrees?
>>>
>>> Regards
>>>
>>> Antoine.

Re: Language-independent and cross-language docs

Posted by Kouhei Sutou <ko...@clear-code.com>.
Hi,

> I really like the Scala/Python tabs in the Spark docs [2].

> [2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations

Oh, I also like it.

> > - Should we do this at all (i.e. build up a central documentation system)?

Yes.

> > - Should we use Sphinx for it?

I'm neutral.

If we choose Sphinx or something, we need some works for
Apache Arrow C. It uses GTK-Doc as its documentation
system. We'll need to create a tool like
https://github.com/pygobject/pgi-docgen . (It's a tool for
Sphinx.)

Apache Arrow C needs to keep using GTK-Doc style for API
documentation. Because it's also used by GObject
Introspection. GObject Introspection is very important in
Apache Arrow C. For example, The Ruby bindings needs GObject
Introspection support. So we shouldn't drop GObject
Introspection support.

Other documentations such as tutorial (they doesn't exist
yet :<) don't need to use GTK-Doc style.


We'll need to create a similar tool for Apache Arrow Ruby.
The most API of Apache Arrow Ruby are generated
automatically by GObject Introspection support. We can reuse
GTK-Doc style documentation in Apache Arrow C for Apache
Arrow Ruby.

We may be able to use
https://github.com/ruby-gnome2/yard-gobject-introspection
for Apache Arrow Ruby. It's not completed yet but we can
improve it. (I'm one of the developers of it.)


Thanks,
--
kou

In <15...@webmail.messagingengine.com>
  "Re: Language-independent and cross-language docs" on Thu, 17 May 2018 21:44:33 +0200,
  "Uwe L. Korn" <uw...@xhochy.com> wrote:

> Hello,
> 
> I can second that we should move the documentation to a central one. As a C++ and Python contributor at the same time it always hard to think of where you should document a specific piece. We have a very small C++ documentation and a bit larger Python one. For some features it would though make sense to have them in both. IPC and in-process sharing is also a main part of the Arrow project. Documenting this separately for each language will be a lot of work and probably leave blind spots in each language.
> 
> Not everything in each language ecosystem can be directly included in Sphinx but as Sphinx is becoming a very broadly used documentation system, there are many nice converters like Breeze [1] (Doxygen to Sphinx) available.
> 
> To directly answer the questions:
> 
> - Should we do this at all (i.e. build up a central documentation system)?
> 
> Yes
>  
> - Should we use Sphinx for it?
> 
> Very much in favour. There is probably also a tendency that some people prefer Markdown (I do) but given the feature set of Sphinx, I would very much argue in favour of it.
> 
>  - To which extent our current docs should be migrated to Sphinx (apart
>  from the Python docs, which already use Sphinx)?  For example, should
>  the specs (currently standalone pages written in Markdown) be migrated
>  to Sphinx for better cross-referencing and navigation?  What about the
>  C++ tutorial pages?  etc.
> 
> I would migrate C++ documentation definitely fully into that but the C++ / Python relation is very tight. There are a lot of topics that either touch two languages or are general to the project, these should also go in there.
>  
> - Should we preferably have a single Sphinx doctree, or several
>  independent per-topic / per-language doctrees?
> 
> I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will have many shared topics between the different implemenations so I would expect that we should have a single documentation with well organized sections.
> 
> Also we probably will face the issue we have documentation on a specific topic and only a small part is different between two implementations/setups/... I really like the Scala/Python tabs in the Spark docs [2]. There is a Sphinx extension that seems to something similar to this [3]. This could either be used to have documentation on how to construct things where one switches between Ruby and Python or the main issue where I would need it: Setting up the build with slightly different package managers (e.g. conda vs pip in Python).
> 
> Uwe
> 
> [1]: https://breathe.readthedocs.io/en/latest/
> [2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
> [3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html
> 
> 
> On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote:
>> 
>> Hi,
>> 
>> In the following PR discussion it was mentioned that we currently lack a
>> central documentation system for cross-language topics:
>> https://github.com/apache/arrow/pull/1575#issuecomment-364062240
>> 
>> Sphinx looks like a reasonable contender for that purpose.  For that who
>> don't know it, Sphinx is a documentation system initially developed for
>> the Python language, which quickly became widely-used amongst Python
>> projects, and is now being used by non-Python projects as well.  For
>> example, the LLVM docs (https://llvm.org/docs/) and even the Linux
>> kernel online docs are now written using Sphinx
>> (https://www.kernel.org/doc/html/latest/index.html).
>> 
>> Sphinx uses reStructuredText (a.k.a "reST") as its basic markup
>> language, but with many extensions.  It allows for structured
>> documentation with extensive cross-referencing (even between independent
>> Sphinx sites, using the "intersphinx" extension).
>> 
>> The questions here are:
>> 
>> - Should we do this at all (i.e. build up a central documentation system)?
>> 
>> - Should we use Sphinx for it?
>> 
>> - To which extent our current docs should be migrated to Sphinx (apart
>> from the Python docs, which already use Sphinx)?  For example, should
>> the specs (currently standalone pages written in Markdown) be migrated
>> to Sphinx for better cross-referencing and navigation?  What about the
>> C++ tutorial pages?  etc.
>> 
>> - Should we preferably have a single Sphinx doctree, or several
>> independent per-topic / per-language doctrees?
>> 
>> Regards
>> 
>> Antoine.

Re: Language-independent and cross-language docs

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello,

I can second that we should move the documentation to a central one. As a C++ and Python contributor at the same time it always hard to think of where you should document a specific piece. We have a very small C++ documentation and a bit larger Python one. For some features it would though make sense to have them in both. IPC and in-process sharing is also a main part of the Arrow project. Documenting this separately for each language will be a lot of work and probably leave blind spots in each language.

Not everything in each language ecosystem can be directly included in Sphinx but as Sphinx is becoming a very broadly used documentation system, there are many nice converters like Breeze [1] (Doxygen to Sphinx) available.

To directly answer the questions:

- Should we do this at all (i.e. build up a central documentation system)?

Yes
 
- Should we use Sphinx for it?

Very much in favour. There is probably also a tendency that some people prefer Markdown (I do) but given the feature set of Sphinx, I would very much argue in favour of it.

 - To which extent our current docs should be migrated to Sphinx (apart
 from the Python docs, which already use Sphinx)?  For example, should
 the specs (currently standalone pages written in Markdown) be migrated
 to Sphinx for better cross-referencing and navigation?  What about the
 C++ tutorial pages?  etc.

I would migrate C++ documentation definitely fully into that but the C++ / Python relation is very tight. There are a lot of topics that either touch two languages or are general to the project, these should also go in there.
 
- Should we preferably have a single Sphinx doctree, or several
 independent per-topic / per-language doctrees?

I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will have many shared topics between the different implemenations so I would expect that we should have a single documentation with well organized sections.

Also we probably will face the issue we have documentation on a specific topic and only a small part is different between two implementations/setups/... I really like the Scala/Python tabs in the Spark docs [2]. There is a Sphinx extension that seems to something similar to this [3]. This could either be used to have documentation on how to construct things where one switches between Ruby and Python or the main issue where I would need it: Setting up the build with slightly different package managers (e.g. conda vs pip in Python).

Uwe

[1]: https://breathe.readthedocs.io/en/latest/
[2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
[3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html


On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote:
> 
> Hi,
> 
> In the following PR discussion it was mentioned that we currently lack a
> central documentation system for cross-language topics:
> https://github.com/apache/arrow/pull/1575#issuecomment-364062240
> 
> Sphinx looks like a reasonable contender for that purpose.  For that who
> don't know it, Sphinx is a documentation system initially developed for
> the Python language, which quickly became widely-used amongst Python
> projects, and is now being used by non-Python projects as well.  For
> example, the LLVM docs (https://llvm.org/docs/) and even the Linux
> kernel online docs are now written using Sphinx
> (https://www.kernel.org/doc/html/latest/index.html).
> 
> Sphinx uses reStructuredText (a.k.a "reST") as its basic markup
> language, but with many extensions.  It allows for structured
> documentation with extensive cross-referencing (even between independent
> Sphinx sites, using the "intersphinx" extension).
> 
> The questions here are:
> 
> - Should we do this at all (i.e. build up a central documentation system)?
> 
> - Should we use Sphinx for it?
> 
> - To which extent our current docs should be migrated to Sphinx (apart
> from the Python docs, which already use Sphinx)?  For example, should
> the specs (currently standalone pages written in Markdown) be migrated
> to Sphinx for better cross-referencing and navigation?  What about the
> C++ tutorial pages?  etc.
> 
> - Should we preferably have a single Sphinx doctree, or several
> independent per-topic / per-language doctrees?
> 
> Regards
> 
> Antoine.