You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/14 22:43:45 UTC
[GitHub] [spark] khalidmammadov opened a new pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
khalidmammadov opened a new pull request #35516:
URL: https://github.com/apache/spark/pull/35516
### What changes were proposed in this pull request?
Current instructions in README file is not complete and not sufficient to complete site build for testing and validation.
After number of trial and errors I have managed to build it. In the process I had to install number of additional packages.
This PR purposes improvements to the documentation to avoid spending similar efforts for contributors.
### Why are the changes needed?
Improve Spark documentation generation procedure
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
I have started a docker container:
`docker run --name spark_doc_build_new -p 4000:4000 -it spark_doc_build_image`
and installed everything as per below
```
apt-get update
apt-get -y install git nano
apt-get -y install curl
apt-get -y install ruby-full
apt-get -y install python3 pip
apt-get -y install gnupg
echo "deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/" >> /etc/apt/sources.list
apt-key adv --keyserver keyserver.ubuntu.com --recv-key '95C0FAF38DB3CCAD0C080A7BDC78B2DDEABC47B7'
apt-get update
apt-get -y install r-base
apt-get -y install pandoc libxml2-dev
apt-get -y install libcurl4-openssl-dev
apt-get -y install libssl-dev
apt-get -y install libfontconfig1-dev libharfbuzz-dev libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev
Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Rscript -e 'devtools::install_version("roxygen2", version = "7.1.2", repos="https://cloud.r-project.org/")'
Rscript -e "devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')"
Rscript -e "devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')"
echo 'deb http://security.debian.org/debian-security stretch/updates main' >> /etc/apt/sources.list
apt-get update
apt-get -y install openjdk-8-jdk
apt-get -y install scala
git clone https://github.com/apache/spark.git
cd spark/doc
gem install bundler
bundle install
bundle exec jekyll build
```
and checked via jekyll serve from host
`bundle exec jekyll serve --host 0.0.0.0`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810662636
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+ libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Review comment:
I re-tested that number of times in docker containers and it always fails if package is not installed. So, yes, in short `markdown` is required package.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806378581
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+ libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Review comment:
Hm, `rmarkdown` depends on `markdown` IIRC. `rmarkdown` falls back to `markdown`. Was this required in your env?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806915067
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
Review comment:
-> "similar to the main documentation site at ..."
Start a new sentence like "with all APIs documented. Partial ..."
I think this could be clarified: "Partial documentation builds, for a specific language or API, are also possible"
##########
File path: docs/README.md
##########
@@ -111,7 +112,15 @@ $ bundle exec jekyll serve --watch
$ PRODUCTION=1 bundle exec jekyll build
```
-## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
+You can optionally skip API build (for partial build) as it takes time
Review comment:
This needs a rewrite: "To create a partial build without API docs (which can take a long time), use SKIP_API=1:"
But then I thought partial builds were _just_ the API docs? this addition is confusing
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
+to one you can find https://spark.apache.org/documentation.html with all APIs documented and partial
+one can be used to build a language/API specific documentation.
+
+### Prerequisites
The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
Python, R and SQL.
-You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
-[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+For complete documentation all below tools must be installed **including Optionals**.
Review comment:
below tools -> tools below
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
+to one you can find https://spark.apache.org/documentation.html with all APIs documented and partial
+one can be used to build a language/API specific documentation.
+
+### Prerequisites
The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
Python, R and SQL.
-You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
-[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+For complete documentation all below tools must be installed **including Optionals**.
+
+You need to have JDK, Scala, [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
Review comment:
JDK -> the JDK
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on pull request #35516:
URL: https://github.com/apache/spark/pull/35516#issuecomment-1039665182
I have also made this Dockerfile to make the process even easier. Would that be valuable to add to the repo?
https://github.com/khalidmammadov/spark/pull/1/files
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806915067
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
Review comment:
-> "similar to the main documentation site at ..."
Start a new sentence like "with all APIs documented. Partial ..."
I think this could be clarified: "Partial documentation builds, for a specific language or API, are also possible"
##########
File path: docs/README.md
##########
@@ -111,7 +112,15 @@ $ bundle exec jekyll serve --watch
$ PRODUCTION=1 bundle exec jekyll build
```
-## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
+You can optionally skip API build (for partial build) as it takes time
Review comment:
This needs a rewrite: "To create a partial build without API docs (which can take a long time), use SKIP_API=1:"
But then I thought partial builds were _just_ the API docs? this addition is confusing
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
+to one you can find https://spark.apache.org/documentation.html with all APIs documented and partial
+one can be used to build a language/API specific documentation.
+
+### Prerequisites
The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
Python, R and SQL.
-You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
-[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+For complete documentation all below tools must be installed **including Optionals**.
Review comment:
below tools -> tools below
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
+to one you can find https://spark.apache.org/documentation.html with all APIs documented and partial
+one can be used to build a language/API specific documentation.
+
+### Prerequisites
The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
Python, R and SQL.
-You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
-[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+For complete documentation all below tools must be installed **including Optionals**.
+
+You need to have JDK, Scala, [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
Review comment:
JDK -> the JDK
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806378581
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+ libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Review comment:
Hm, `rmarkdown` depends on `markdown` IIRC. `rmarkdown` falls back to `markdown`. Was this required in your env?
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+ libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Review comment:
cc @huaxingao FYI who faced a similar problem before IIRC ..
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
Review comment:
d -> D
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
Review comment:
Hm, I think we should better make it independent from the OS
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806378693
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+ libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Review comment:
cc @huaxingao FYI who faced a similar problem before IIRC ..
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen closed pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
srowen closed pull request #35516:
URL: https://github.com/apache/spark/pull/35516
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on pull request #35516:
URL: https://github.com/apache/spark/pull/35516#issuecomment-1039665182
I have also made this Dockerfile to make the process even easier. Would that be valuable to add to the repo?
https://github.com/khalidmammadov/spark/pull/1/files
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810628492
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
Review comment:
Do you have any suggestion? I can only suggest adding a Dockerfile similar to [this one](https://github.com/khalidmammadov/spark/blob/e9cec4091b159e1c0c6c44a1fb816ca16a77e9f5/docs/Dockerfile) to build and test the changes or omit these installs as they are for linux? In the last case it makes again not complete and one needs to figure it out what to install every time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806378840
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
Review comment:
d -> D
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810673824
##########
File path: docs/README.md
##########
@@ -111,7 +112,7 @@ $ bundle exec jekyll serve --watch
$ PRODUCTION=1 bundle exec jekyll build
```
-## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
+## Generating individual API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
Review comment:
APIs are "Scala", "Java", "Python", "R". roxygen2, mkdocs, sphinx are not APIs
##########
File path: docs/README.md
##########
@@ -129,6 +130,14 @@ The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-d
using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) and SQL docs
using [MkDocs](https://www.mkdocs.org/).
-NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1
-bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
+NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, see below example.
Review comment:
"see the example below"
##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
Review comment:
Newline after section heading, like others
##########
File path: docs/README.md
##########
@@ -111,7 +112,7 @@ $ bundle exec jekyll serve --watch
$ PRODUCTION=1 bundle exec jekyll build
```
-## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
+## Generating individual API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
Review comment:
Also, I'm confused, weren't the sections above already about generating individual API docs?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810394074
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+ libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Review comment:
I had the same problem: I tested with and without markdown package and it failed without.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810385503
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+ libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
Review comment:
I think I finally managed to understand what's going on...
So, I'm using [this Docker](https://github.com/khalidmammadov/spark/blob/e9cec4091b159e1c0c6c44a1fb816ca16a77e9f5/docs/Dockerfile) for the build. And tested with and without `markdown` package and it fails without and I couldn't understand how it succeeds in the [build and test CI](https://github.com/apache/spark/blob/94fd9c55c6a29208bbfe240bd2f3191c7df4c666/.github/workflows/build_and_test.yml#L537) phase. So, apparently it's installed on the base image (and others I am adding here) from @dongjoon-hyun 's [Docker image](https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore) (BTW, where is the source of this Dockerfile kept?). So, some packages are "reinstalled" during Build and test and some not hence the confusion.
Additionally, I tested building "without" `rmarkdown` and it succeeds.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35516:
URL: https://github.com/apache/spark/pull/35516#issuecomment-1041224741
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806379375
##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
### R API Documentation (Optional)
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
Review comment:
Hm, I think we should better make it independent from the OS
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org