You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/09/01 09:32:44 UTC

[GitHub] [orc] guiyanakuang commented on a change in pull request #889: ORC-727: Update `Java Tools` documentation

guiyanakuang commented on a change in pull request #889:
URL: https://github.com/apache/orc/pull/889#discussion_r699440935



##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       Use ``might work, I'll test it later

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character

Review comment:
       I've updated java-tools.md. My local preview html confirms that it's fixed.

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.

Review comment:
       I removed this phrase. BTW way count command does not give any error if there is no path. Also I would like to support specifying a single file, not necessarily a directory.

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.

Review comment:
       I removed this phrase. BTW count command does not give any error if there is no path. Also I would like to support specifying a single file, not necessarily a directory.

##########
File path: site/_docs/java-tools.md
##########
@@ -11,43 +11,136 @@ supports both the local file system and HDFS.
 
 The subcommands for the tools are:
 
-  * meta - print the metadata of an ORC file
-  * data - print the data of an ORC file
-  * scan (since ORC 1.3) - scan the data for benchmarking
   * convert (since ORC 1.4) - convert JSON files to ORC
+  * count (since ORC 1.6) - recursively find *.orc and print the number of rows
+  * data - print the data of an ORC file
   * json-schema (since ORC 1.4) - determine the schema of JSON documents
+  * key (since ORC 1.5) - print information about the encryption keys
+  * meta - print the metadata of an ORC file
+  * scan (since ORC 1.3) - scan the data for benchmarking
+  * version (since ORC 1.6) - print the version of this ORC tool
 
 The command line looks like:
 
 ~~~ shell
 % java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
 ~~~
 
+## Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-e,--escape `<escape>`
+  : Sets CSV escape character
+
+-h,--help
+  : Print help
+
+-H,--header `<header>`
+  : Sets CSV header lines
+
+-n,--null `<null>`
+  : Sets CSV null string
+
+-o,--output `<filename>`
+  : Sets the output ORC filename, which defaults to output.orc
+
+-O,--overwrite
+  : If the file already exists, it will be overwritten
+
+-q,--quote `<quote>`
+  : Sets CSV quote character
+
+-s,--schema `<schema>`
+  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-S,--separator `<separator>`
+  : Sets CSV separator character
+
+-t,--timestampformat `<timestampformat>`
+  : Sets timestamp Format
+  
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+## Java Count
+
+The count command recursively find *.orc and print the number of rows. The parameter value can be a space separated file paths string.
+
+## Java Data
+
+The data command prints the data in an ORC file as a JSON document. Each
+record is printed as a JSON object on a line. Each record is annotated with
+the fieldnames and a JSON representation that depends on the field's type.
+
+-h,--help

Review comment:
       Fix it in da9740a

##########
File path: site/_docs/java-tools.md
##########
@@ -201,47 +294,21 @@ Padding ratio: 0%
 ______________________________________________________________________
 ~~~
 
-## Java Data
-
-The data command prints the data in an ORC file as a JSON document. Each
-record is printed as a JSON object on a line. Each record is annotated with
-the fieldnames and a JSON representation that depends on the field's type.
-
 ## Java Scan
 
 The scan command reads the contents of the file without printing anything. It
 is primarily intendend for benchmarking the Java reader without including the
 cost of printing the data out.
 
-## Java Convert
-
-The convert command reads several JSON files and converts them into a
-single ORC file.
-
--o <filename>
-  : Sets the output ORC filename, which defaults to output.orc
-
--s <schema>
-  : Sets the schema for the ORC file. By default, the schema is automatically discovered.
-
--h
+-h,--help
   : Print help
-  
-The automatic JSON schema discovery is equivalent to the json-schema tool
-below.
 
-## Java JSON Schema
-
-The JSON Schema discovery tool processes a set of JSON documents and
-produces a schema that encompasses all of the records in all of the
-documents. It works by computing the enclosing type and promoting it
-to include all of the observed values.
+-s,--schema
+  : Print schema
 
--f
-  : Print the schema as a list of flat types for each subfield
+-v,--verbose
+  : Print exceptions
 
--t
-  : Print the schema as a Hive table declaration
+## Java Version
 
--h
-  : Print help
\ No newline at end of file
+The version command print the version of this ORC tool.

Review comment:
       Fix it in da9740a




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org