You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "MrPowers (via GitHub)" <gi...@apache.org> on 2023/05/01 13:20:04 UTC

[GitHub] [arrow-datafusion-python] MrPowers opened a new pull request, #364: First pass of documentation in mdBook

MrPowers opened a new pull request, #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364

   # Which issue does this PR close?
   
   Closes #339.
   
    # Rationale for this change
   
   The documentation is currently in sphinx, which is challenging to update (requires RST, not markdown) and has bad default settings for SEO/usability.  Migrating the documentation to mdBook will make it easier to build an amazing user guide.
   
   # What changes are included in this PR?
   
   A folder that generates a mdBook documentation site.
   
   # Are there any user-facing changes?
   
   No, this just changed the documentation.  No code changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove commented on a diff in pull request #364: First pass of documentation in mdBook

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on code in PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#discussion_r1183658825


##########
docs/mdbook/README.md:
##########
@@ -0,0 +1,17 @@
+# DataFusion Book

Review Comment:
   We will need to add ASF headers to  all of these `.md` files:
   
   ```suggestion
   <!---
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   # DataFusion Book
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove commented on a diff in pull request #364: First pass of documentation in mdBook

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on code in PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#discussion_r1183656254


##########
docs/mdbook/src/installation.md:
##########
@@ -0,0 +1,46 @@
+# Installation
+
+DataFusion is easy to install, just like any other Python library.
+
+## Using Pip
+
+``` bash
+pip install datafusion
+```
+
+## conda & JupyterLab setup
+
+This section explains how to install DataFusion in a conda environment with other libraries that allow for a nice Jupyter workflow.  This setup is completely optional.  These steps are only needed if you'd like to run DataFusion in a Jupyter notebook and have an interface like this:
+
+![DataFusion in Jupyter](https://github.com/MrPowers/datafusion-book/raw/main/src/images/datafusion-jupyterlab.png)
+
+Create a conda environment with DataFusion, Jupyter, and other useful dependencies in the `datafusion-env.yml` file:
+
+```
+name: datafusion-env
+channels:
+  - conda-forge
+  - defaults
+dependencies:
+  - python=3.9
+  - ipykernel
+  - nb_conda
+  - jupyterlab
+  - jupyterlab_code_formatter
+  - isort
+  - black
+  - pip
+  - pip:
+    - datafusion
+
+```
+
+Create the environment with `conda env create -f datafusion-env.yml`.
+
+Activate the environment with `conda activate datafusion-env`.
+
+Run `jupyter lab` or open the [JupyterLab Desktop application](https://github.com/jupyterlab/jupyterlab-desktop) to start running DataFusion in a Jupyter notebook.
+
+## Examples
+
+See the [pydata-examples](https://github.com/MrPowers/pydata-examples) for a variety of Jupyter notebooks that show DataFusion in action!

Review Comment:
   Could we link to an example in this repo?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove commented on pull request #364: First pass of documentation in mdBook

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#issuecomment-1533930351

   @MrPowers two more files need ASF headers:
   
   NOT APPROVED: docs/mdbook/README.md (./docs/mdbook/README.md): false
   NOT APPROVED: docs/mdbook/book.toml (./docs/mdbook/book.toml): false


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove commented on pull request #364: First pass of documentation in mdBook

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#issuecomment-1532997634

   Thanks @MrPowers this looks like a great start. I left some feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] MrPowers commented on a diff in pull request #364: First pass of documentation in mdBook

Posted by "MrPowers (via GitHub)" <gi...@apache.org>.
MrPowers commented on code in PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#discussion_r1184389283


##########
docs/mdbook/src/installation.md:
##########
@@ -0,0 +1,46 @@
+# Installation
+
+DataFusion is easy to install, just like any other Python library.
+
+## Using Pip
+
+``` bash
+pip install datafusion
+```
+
+## conda & JupyterLab setup
+
+This section explains how to install DataFusion in a conda environment with other libraries that allow for a nice Jupyter workflow.  This setup is completely optional.  These steps are only needed if you'd like to run DataFusion in a Jupyter notebook and have an interface like this:
+
+![DataFusion in Jupyter](https://github.com/MrPowers/datafusion-book/raw/main/src/images/datafusion-jupyterlab.png)
+
+Create a conda environment with DataFusion, Jupyter, and other useful dependencies in the `datafusion-env.yml` file:
+
+```
+name: datafusion-env
+channels:
+  - conda-forge
+  - defaults
+dependencies:
+  - python=3.9
+  - ipykernel
+  - nb_conda
+  - jupyterlab
+  - jupyterlab_code_formatter
+  - isort
+  - black
+  - pip
+  - pip:
+    - datafusion
+
+```
+
+Create the environment with `conda env create -f datafusion-env.yml`.
+
+Activate the environment with `conda activate datafusion-env`.
+
+Run `jupyter lab` or open the [JupyterLab Desktop application](https://github.com/jupyterlab/jupyterlab-desktop) to start running DataFusion in a Jupyter notebook.
+
+## Examples
+
+See the [pydata-examples](https://github.com/MrPowers/pydata-examples) for a variety of Jupyter notebooks that show DataFusion in action!

Review Comment:
   Changed this to [the examples in this repo,](https://github.com/apache/arrow-datafusion-python/tree/main/examples) which feels more appropriate.



##########
docs/mdbook/src/usage/create-table.md:
##########
@@ -0,0 +1,43 @@
+# DataFusion Create Table
+
+It's easy to create DataFusion tables from a variety of data sources.
+
+## Create table from Python Dictionary

Review Comment:
   Good catch, updated throughout.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove commented on a diff in pull request #364: First pass of documentation in mdBook

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on code in PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#discussion_r1183655341


##########
docs/mdbook/book.toml:
##########
@@ -0,0 +1,6 @@
+[book]
+authors = ["Matthew Powers"]

Review Comment:
   Can we change this to something like this:
   
   ```
   authors = ["Apache Arrow <de...@arrow.apache.org>"]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] MrPowers commented on a diff in pull request #364: First pass of documentation in mdBook

Posted by "MrPowers (via GitHub)" <gi...@apache.org>.
MrPowers commented on code in PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#discussion_r1184389408


##########
docs/mdbook/README.md:
##########
@@ -0,0 +1,17 @@
+# DataFusion Book

Review Comment:
   Added the license throughout.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] MrPowers commented on pull request #364: First pass of documentation in mdBook

Posted by "MrPowers (via GitHub)" <gi...@apache.org>.
MrPowers commented on PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#issuecomment-1529716787

   <img width="1144" alt="Screenshot 2023-05-01 at 9 34 17 AM" src="https://user-images.githubusercontent.com/2722395/235458878-09ba0235-d294-4481-990f-75867b08a1bd.png">
   
   mdBook generates docs that are really nice-looking.  Excited to collaborate on the team with next steps!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove merged pull request #364: First pass of documentation in mdBook

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove merged PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] MrPowers commented on pull request #364: First pass of documentation in mdBook

Posted by "MrPowers (via GitHub)" <gi...@apache.org>.
MrPowers commented on PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#issuecomment-1533867660

   Thanks for the review @andygrove.  Feel free to comment with any other next steps and I'll be happy to continue updating this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] MrPowers commented on a diff in pull request #364: First pass of documentation in mdBook

Posted by "MrPowers (via GitHub)" <gi...@apache.org>.
MrPowers commented on code in PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#discussion_r1184388903


##########
docs/mdbook/book.toml:
##########
@@ -0,0 +1,6 @@
+[book]
+authors = ["Matthew Powers"]

Review Comment:
   Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] MrPowers commented on pull request #364: First pass of documentation in mdBook

Posted by "MrPowers (via GitHub)" <gi...@apache.org>.
MrPowers commented on PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#issuecomment-1534475973

   Thanks @andygrove, made those updates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove commented on a diff in pull request #364: First pass of documentation in mdBook

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on code in PR #364:
URL: https://github.com/apache/arrow-datafusion-python/pull/364#discussion_r1183657678


##########
docs/mdbook/src/usage/create-table.md:
##########
@@ -0,0 +1,43 @@
+# DataFusion Create Table
+
+It's easy to create DataFusion tables from a variety of data sources.
+
+## Create table from Python Dictionary

Review Comment:
   Do we want to capitalize "table" here and the following sections?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org