You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/14 22:43:45 UTC

[GitHub] [spark] khalidmammadov opened a new pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

khalidmammadov opened a new pull request #35516:
URL: https://github.com/apache/spark/pull/35516


   ### What changes were proposed in this pull request?
   Current instructions in README file is not complete and not sufficient to complete site build for testing and validation. 
   After number of trial and errors I have managed to build it. In the process I had to install number of additional packages. 
   This PR purposes improvements to the documentation to avoid spending similar efforts for contributors.
   
   ### Why are the changes needed?
   Improve Spark documentation generation procedure
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   I have started a docker container:
   `docker run --name spark_doc_build_new -p 4000:4000 -it spark_doc_build_image`
    and installed everything as per below
   ```
   apt-get update
   apt-get -y install  git nano
   apt-get -y install  curl
   apt-get -y install  ruby-full
   apt-get -y install  python3 pip
   apt-get -y install  gnupg
   
   echo "deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/" >> /etc/apt/sources.list
   apt-key adv --keyserver keyserver.ubuntu.com --recv-key '95C0FAF38DB3CCAD0C080A7BDC78B2DDEABC47B7'
   apt-get update
   
   apt-get -y install  r-base
   apt-get -y install  pandoc libxml2-dev
   apt-get -y install  libcurl4-openssl-dev
   apt-get -y install  libssl-dev
   apt-get -y install  libfontconfig1-dev libharfbuzz-dev   libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev
   
   Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'
   Rscript -e 'devtools::install_version("roxygen2", version = "7.1.2", repos="https://cloud.r-project.org/")'
   Rscript -e "devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')"
   Rscript -e "devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')"
   
   echo 'deb http://security.debian.org/debian-security stretch/updates main' >> /etc/apt/sources.list
   apt-get update
   apt-get -y install  openjdk-8-jdk
   apt-get -y install  scala
   
   git clone https://github.com/apache/spark.git
   cd spark/doc
   
   gem install bundler
   bundle install
   bundle exec jekyll build
   ```
   and checked via jekyll serve from host
   `bundle exec jekyll serve --host 0.0.0.0`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] khalidmammadov commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810662636



##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+          libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
 
 ```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'

Review comment:
       I re-tested that number of times in docker containers and it always fails if package is not installed. So, yes, in short `markdown` is required package. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806378581



##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+          libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
 
 ```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'

Review comment:
       Hm, `rmarkdown` depends on `markdown` IIRC. `rmarkdown` falls back to `markdown`. Was this required in your env?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806915067



##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar

Review comment:
       -> "similar to the main documentation site at ..."
   Start a new sentence like "with all APIs documented. Partial ..."
   I think this could be clarified: "Partial documentation builds, for a specific language or API, are also possible"

##########
File path: docs/README.md
##########
@@ -111,7 +112,15 @@ $ bundle exec jekyll serve --watch
 $ PRODUCTION=1 bundle exec jekyll build
 ```
 
-## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
+You can optionally skip API build (for partial build) as it takes time

Review comment:
       This needs a rewrite: "To create a partial build without API docs (which can take a long time), use SKIP_API=1:"
   But then I thought partial builds were _just_ the API docs? this addition is confusing

##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
+to one you can find https://spark.apache.org/documentation.html with all APIs documented and partial
+one can be used to build a language/API specific documentation.
+
+### Prerequisites
 
 The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
 Python, R and SQL.
 
-You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
-[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+For complete documentation all below tools must be installed **including Optionals**.

Review comment:
       below tools -> tools below

##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
+to one you can find https://spark.apache.org/documentation.html with all APIs documented and partial
+one can be used to build a language/API specific documentation.
+
+### Prerequisites
 
 The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
 Python, R and SQL.
 
-You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
-[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+For complete documentation all below tools must be installed **including Optionals**.
+
+You need to have JDK, Scala, [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and

Review comment:
       JDK -> the JDK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] khalidmammadov commented on pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on pull request #35516:
URL: https://github.com/apache/spark/pull/35516#issuecomment-1039665182


   I have also made this Dockerfile to make the process even easier. Would that be valuable to add to the repo?
   https://github.com/khalidmammadov/spark/pull/1/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806915067



##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar

Review comment:
       -> "similar to the main documentation site at ..."
   Start a new sentence like "with all APIs documented. Partial ..."
   I think this could be clarified: "Partial documentation builds, for a specific language or API, are also possible"

##########
File path: docs/README.md
##########
@@ -111,7 +112,15 @@ $ bundle exec jekyll serve --watch
 $ PRODUCTION=1 bundle exec jekyll build
 ```
 
-## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
+You can optionally skip API build (for partial build) as it takes time

Review comment:
       This needs a rewrite: "To create a partial build without API docs (which can take a long time), use SKIP_API=1:"
   But then I thought partial builds were _just_ the API docs? this addition is confusing

##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
+to one you can find https://spark.apache.org/documentation.html with all APIs documented and partial
+one can be used to build a language/API specific documentation.
+
+### Prerequisites
 
 The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
 Python, R and SQL.
 
-You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
-[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+For complete documentation all below tools must be installed **including Optionals**.

Review comment:
       below tools -> tools below

##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar
+to one you can find https://spark.apache.org/documentation.html with all APIs documented and partial
+one can be used to build a language/API specific documentation.
+
+### Prerequisites
 
 The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
 Python, R and SQL.
 
-You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
-[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
+For complete documentation all below tools must be installed **including Optionals**.
+
+You need to have JDK, Scala, [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and

Review comment:
       JDK -> the JDK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806378581



##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+          libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
 
 ```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'

Review comment:
       Hm, `rmarkdown` depends on `markdown` IIRC. `rmarkdown` falls back to `markdown`. Was this required in your env?

##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+          libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
 
 ```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'

Review comment:
       cc @huaxingao FYI who faced a similar problem before IIRC ..

##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation

Review comment:
       d -> D

##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \

Review comment:
       Hm, I think we should better make it independent from the OS




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806378693



##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+          libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
 
 ```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'

Review comment:
       cc @huaxingao FYI who faced a similar problem before IIRC ..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen closed pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
srowen closed pull request #35516:
URL: https://github.com/apache/spark/pull/35516


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] khalidmammadov commented on pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on pull request #35516:
URL: https://github.com/apache/spark/pull/35516#issuecomment-1039665182


   I have also made this Dockerfile to make the process even easier. Would that be valuable to add to the repo?
   https://github.com/khalidmammadov/spark/pull/1/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] khalidmammadov commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810628492



##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \

Review comment:
       Do you have any suggestion? I can only suggest adding a Dockerfile similar to [this one](https://github.com/khalidmammadov/spark/blob/e9cec4091b159e1c0c6c44a1fb816ca16a77e9f5/docs/Dockerfile) to build and test the changes or omit these installs as they are for linux? In the last case it makes again not complete and one needs to figure it out what to install every time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806378840



##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation

Review comment:
       d -> D




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810673824



##########
File path: docs/README.md
##########
@@ -111,7 +112,7 @@ $ bundle exec jekyll serve --watch
 $ PRODUCTION=1 bundle exec jekyll build
 ```
 
-## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
+## Generating individual API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)

Review comment:
       APIs are "Scala", "Java", "Python", "R". roxygen2, mkdocs, sphinx are not APIs

##########
File path: docs/README.md
##########
@@ -129,6 +130,14 @@ The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-d
 using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) and SQL docs
 using [MkDocs](https://www.mkdocs.org/).
 
-NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1
-bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
+NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, see below example. 

Review comment:
       "see the example below"

##########
File path: docs/README.md
##########
@@ -26,13 +26,20 @@ Read on to learn more about viewing documentation in plain text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that correspond to
 whichever version of Spark you currently have checked out of revision control.
 
-## Prerequisites
+## Building documentation
+There are two ways to build Spark documentation, complete and partial. Complete will build a site similar

Review comment:
       Newline after section heading, like others
   

##########
File path: docs/README.md
##########
@@ -111,7 +112,7 @@ $ bundle exec jekyll serve --watch
 $ PRODUCTION=1 bundle exec jekyll build
 ```
 
-## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
+## Generating individual API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)

Review comment:
       Also, I'm confused, weren't the sections above already about generating individual API docs?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810394074



##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+          libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
 
 ```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'

Review comment:
       I had the same problem:  I tested with and without markdown package and it failed without. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] khalidmammadov commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r810385503



##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
+          libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libxml2-dev
+```
 
 ```sh
-$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
+$ sudo Rscript -e 'install.packages(c("curl", "knitr", "devtools", "testthat", "rmarkdown", "markdown", "e1071"), repos="https://cloud.r-project.org/")'

Review comment:
       I think I finally managed to understand what's going on... 
   So, I'm using [this Docker](https://github.com/khalidmammadov/spark/blob/e9cec4091b159e1c0c6c44a1fb816ca16a77e9f5/docs/Dockerfile) for the build. And tested with and without `markdown` package and it fails without and I couldn't understand how it succeeds in the [build and test CI](https://github.com/apache/spark/blob/94fd9c55c6a29208bbfe240bd2f3191c7df4c666/.github/workflows/build_and_test.yml#L537) phase. So, apparently it's installed on the base image (and others I am adding here) from @dongjoon-hyun 's [Docker image](https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore) (BTW, where is the source of this Dockerfile kept?). So, some packages are "reinstalled" during Build and test and some not hence the confusion. 
   Additionally, I tested building "without" `rmarkdown` and it succeeds.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35516:
URL: https://github.com/apache/spark/pull/35516#issuecomment-1041224741


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35516: [SPARK-38210][DOCS] Improve documentation generation README

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35516:
URL: https://github.com/apache/spark/pull/35516#discussion_r806379375



##########
File path: docs/README.md
##########
@@ -66,11 +73,15 @@ $ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsph
 
 ### R API Documentation (Optional)
 
-If you'd like to generate R API documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
-and install these libraries:
+If you'd like to generate R API documentation, you'll need to install these packages and libraries:
+
+```sh
+$ sudo apt install libssl-dev libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \

Review comment:
       Hm, I think we should better make it independent from the OS




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org