You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/08/31 12:17:24 UTC

[GitHub] [orc] guiyanakuang opened a new pull request #889: ORC-727: Update `Java Tools` documentation

guiyanakuang opened a new pull request #889:
URL: https://github.com/apache/orc/pull/889


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. File a JIRA issue first and use it as a prefix of your PR title, e.g., `ORC-001: Fix ABC`.
     2. Use your PR title to summarize what this PR proposes instead of describing the problem.
     3. Make PR title and description complete because these will be the permanent commit log.
     4. If possible, provide a concise and reproducible example to reproduce the issue for a faster review.
     5. If the PR is unfinished, use GitHub PR Draft feature.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If there is a discussion in the mailing list, please add the link.
   -->
   Update `Java Tools` documentation.
   Added description of count / key / version.
   Commands and parameters are listed in order.
   Fix the problem that the page does not display parameters.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   To make it easier for users to use java-tool.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   No need to test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699431492



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       In HTML, `--escape` becomes `-escape`. Shall we use a different way to express this?
   ![Screen Shot 2021-08-31 at 8 18 59 AM](https://user-images.githubusercontent.com/9700541/131529938-d7b0adaf-7de1-46da-90b7-221048d41682.png)
   

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.

Review comment:
       Shall we remove the following? Actually, it's not a single parameter. When it's separated by space, we call them `parameters`.
   > `The parameter value can be a space separated file paths string.`

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.
+
+## Java Data
+
+The data command prints the data in an ORC file as a JSON document. Each
+record is printed as a JSON object on a line. Each record is annotated with
+the fieldnames and a JSON representation that depends on the field's type.
+
+-h,--help

Review comment:
       ditto. `--` becomes `-` in HTML generation.

##########
File path: site/_docs/java-tools.md
##########
@@ -201,47 +294,21 @@ Padding ratio: 0%
 ______________________________________________________________________
 ~~~
 
-## Java Data
-
-The data command prints the data in an ORC file as a JSON document. Each
-record is printed as a JSON object on a line. Each record is annotated with
-the fieldnames and a JSON representation that depends on the field's type.
-
 ## Java Scan
 
 The scan command reads the contents of the file without printing anything. It
 is primarily intendend for benchmarking the Java reader without including the
 cost of printing the data out.
 
-## Java Convert
-
-The convert command reads several JSON files and converts them into a
-single ORC file.
-
--o <filename>
-  : Sets the output ORC filename, which defaults to output.orc
-
--s <schema>
-  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
-
--h
+-h,--help
   : Print help
-  
-The automatic JSON schema discovery is equivalent to the json-schema tool
-below.
 
-## Java JSON Schema
-
-The JSON Schema discovery tool processes a set of JSON documents and
-produces a schema that encompasses all of the records in all of the
-documents. It works by computing the enclosing type and promoting it
-to include all of the observed values.
+-s,--schema
+  : Print schema
 
--f
-  : Print the schema as a list of flat types for each subfield
+-v,--verbose
+  : Print exceptions
 
--t
-  : Print the schema as a Hive table declaration
+## Java Version
 
--h
-  : Print help
\ No newline at end of file
+The version command print the version of this ORC tool.

Review comment:
       `print` -> `prints`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699446043



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699440935



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       Use ``might work, I'll test it later

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       I've updated java-tools.md. My local preview html confirms that it's fixed.

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.

Review comment:
       I removed this phrase. BTW way count command does not give any error if there is no path. Also I would like to support specifying a single file, not necessarily a directory.

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.

Review comment:
       I removed this phrase. BTW count command does not give any error if there is no path. Also I would like to support specifying a single file, not necessarily a directory.

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.
+
+## Java Data
+
+The data command prints the data in an ORC file as a JSON document. Each
+record is printed as a JSON object on a line. Each record is annotated with
+the fieldnames and a JSON representation that depends on the field's type.
+
+-h,--help

Review comment:
       Fix it in da9740a

##########
File path: site/_docs/java-tools.md
##########
@@ -201,47 +294,21 @@ Padding ratio: 0%
 ______________________________________________________________________
 ~~~
 
-## Java Data
-
-The data command prints the data in an ORC file as a JSON document. Each
-record is printed as a JSON object on a line. Each record is annotated with
-the fieldnames and a JSON representation that depends on the field's type.
-
 ## Java Scan
 
 The scan command reads the contents of the file without printing anything. It
 is primarily intendend for benchmarking the Java reader without including the
 cost of printing the data out.
 
-## Java Convert
-
-The convert command reads several JSON files and converts them into a
-single ORC file.
-
--o <filename>
-  : Sets the output ORC filename, which defaults to output.orc
-
--s <schema>
-  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
-
--h
+-h,--help
   : Print help
-  
-The automatic JSON schema discovery is equivalent to the json-schema tool
-below.
 
-## Java JSON Schema
-
-The JSON Schema discovery tool processes a set of JSON documents and
-produces a schema that encompasses all of the records in all of the
-documents. It works by computing the enclosing type and promoting it
-to include all of the observed values.
+-s,--schema
+  : Print schema
 
--f
-  : Print the schema as a list of flat types for each subfield
+-v,--verbose
+  : Print exceptions
 
--t
-  : Print the schema as a Hive table declaration
+## Java Version
 
--h
-  : Print help
\ No newline at end of file
+The version command print the version of this ORC tool.

Review comment:
       Fix it in da9740a




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699462507



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.

Review comment:
       I removed this phrase. BTW way count command does not give any error if there is no path. Also I would like to support specifying a single file, not necessarily a directory.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #889:
URL: https://github.com/apache/orc/pull/889


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699431492



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       In HTML, `--escape` becomes `-escape`. Shall we use a different way to express this?
   ![Screen Shot 2021-08-31 at 8 18 59 AM](https://user-images.githubusercontent.com/9700541/131529938-d7b0adaf-7de1-46da-90b7-221048d41682.png)
   

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.

Review comment:
       Shall we remove the following? Actually, it's not a single parameter. When it's separated by space, we call them `parameters`.
   > `The parameter value can be a space separated file paths string.`

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.
+
+## Java Data
+
+The data command prints the data in an ORC file as a JSON document. Each
+record is printed as a JSON object on a line. Each record is annotated with
+the fieldnames and a JSON representation that depends on the field's type.
+
+-h,--help

Review comment:
       ditto. `--` becomes `-` in HTML generation.

##########
File path: site/_docs/java-tools.md
##########
@@ -201,47 +294,21 @@ Padding ratio: 0%
 ______________________________________________________________________
 ~~~
 
-## Java Data
-
-The data command prints the data in an ORC file as a JSON document. Each
-record is printed as a JSON object on a line. Each record is annotated with
-the fieldnames and a JSON representation that depends on the field's type.
-
 ## Java Scan
 
 The scan command reads the contents of the file without printing anything. It
 is primarily intendend for benchmarking the Java reader without including the
 cost of printing the data out.
 
-## Java Convert
-
-The convert command reads several JSON files and converts them into a
-single ORC file.
-
--o <filename>
-  : Sets the output ORC filename, which defaults to output.orc
-
--s <schema>
-  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
-
--h
+-h,--help
   : Print help
-  
-The automatic JSON schema discovery is equivalent to the json-schema tool
-below.
 
-## Java JSON Schema
-
-The JSON Schema discovery tool processes a set of JSON documents and
-produces a schema that encompasses all of the records in all of the
-documents. It works by computing the enclosing type and promoting it
-to include all of the observed values.
+-s,--schema
+  : Print schema
 
--f
-  : Print the schema as a list of flat types for each subfield
+-v,--verbose
+  : Print exceptions
 
--t
-  : Print the schema as a Hive table declaration
+## Java Version
 
--h
-  : Print help
\ No newline at end of file
+The version command print the version of this ORC tool.

Review comment:
       `print` -> `prints`

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699451755



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       I've updated java-tools.md. My local preview html confirms that it's fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699440935



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       Use ``might work, I'll test it later




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699462507



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.

Review comment:
       I removed this phrase. BTW count command does not give any error if there is no path. Also I would like to support specifying a single file, not necessarily a directory.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699463701



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.
+
+## Java Data
+
+The data command prints the data in an ORC file as a JSON document. Each
+record is printed as a JSON object on a line. Each record is annotated with
+the fieldnames and a JSON representation that depends on the field's type.
+
+-h,--help

Review comment:
       Fix it in da9740a

##########
File path: site/_docs/java-tools.md
##########
@@ -201,47 +294,21 @@ Padding ratio: 0%
 ______________________________________________________________________
 ~~~
 
-## Java Data
-
-The data command prints the data in an ORC file as a JSON document. Each
-record is printed as a JSON object on a line. Each record is annotated with
-the fieldnames and a JSON representation that depends on the field's type.
-
 ## Java Scan
 
 The scan command reads the contents of the file without printing anything. It
 is primarily intendend for benchmarking the Java reader without including the
 cost of printing the data out.
 
-## Java Convert
-
-The convert command reads several JSON files and converts them into a
-single ORC file.
-
--o <filename>
-  : Sets the output ORC filename, which defaults to output.orc
-
--s <schema>
-  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
-
--h
+-h,--help
   : Print help
-  
-The automatic JSON schema discovery is equivalent to the json-schema tool
-below.
 
-## Java JSON Schema
-
-The JSON Schema discovery tool processes a set of JSON documents and
-produces a schema that encompasses all of the records in all of the
-documents. It works by computing the enclosing type and promoting it
-to include all of the observed values.
+-s,--schema
+  : Print schema
 
--f
-  : Print the schema as a list of flat types for each subfield
+-v,--verbose
+  : Print exceptions
 
--t
-  : Print the schema as a Hive table declaration
+## Java Version
 
--h
-  : Print help
\ No newline at end of file
+The version command print the version of this ORC tool.

Review comment:
       Fix it in da9740a




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #889: ORC-727: Update `Java Tools` documentation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #889:
URL: https://github.com/apache/orc/pull/889


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org