You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by om...@apache.org on 2017/03/01 17:37:32 UTC
orc git commit: Add documentation for the Java tools jar.
Repository: orc
Updated Branches:
refs/heads/master c7160c53f -> 0b29e9d5d
Add documentation for the Java tools jar.
Fixes #98
Signed-off-by: Owen O'Malley <om...@apache.org>
Project: http://git-wip-us.apache.org/repos/asf/orc/repo
Commit: http://git-wip-us.apache.org/repos/asf/orc/commit/0b29e9d5
Tree: http://git-wip-us.apache.org/repos/asf/orc/tree/0b29e9d5
Diff: http://git-wip-us.apache.org/repos/asf/orc/diff/0b29e9d5
Branch: refs/heads/master
Commit: 0b29e9d5db7a0e3bb0d2ac94b2b7411c412d87a8
Parents: c7160c5
Author: Owen O'Malley <om...@apache.org>
Authored: Tue Feb 28 09:53:17 2017 -0800
Committer: Owen O'Malley <om...@apache.org>
Committed: Wed Mar 1 09:37:15 2017 -0800
----------------------------------------------------------------------
site/_docs/tools.md | 71 ++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 65 insertions(+), 6 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/orc/blob/0b29e9d5/site/_docs/tools.md
----------------------------------------------------------------------
diff --git a/site/_docs/tools.md b/site/_docs/tools.md
index d02daee..fa91136 100644
--- a/site/_docs/tools.md
+++ b/site/_docs/tools.md
@@ -81,15 +81,29 @@ string,struct<int1:int,string1:string>>>",
}
~~~
-## Java Metadata
+## Java ORC Tools
-The org.apache.orc.tools.FileDump Java class, which is available via Hive as:
+In addition to the C++ tools above, there is an ORC tools jar that
+packages several useful utilities and the necessary Java dependencies
+(including Hadoop) into a single package. The Java ORC tool jar
+supports both the local file system and HDFS.
+The subcommands for the tools are:
+ * meta - print the metadata of an ORC file
+ * data - print the data of an ORC file
+ * scan (since ORC 1.3) - scan the data for benchmarking
+ * convert (since ORC 1.4) - convert JSON files to ORC
+ * json-schema (since ORC 1.4) - determine the schema of JSON documents
+
~~~ shell
-% java -jar orc-tools-*.jar meta [-j] [-p] [-t] [--rowindex <cols>]
- [--recover] [--skip-dump] [--backup-path <new path>] <file>
+% java -jar orc-tools-X.Y.Z-uber.jar <sub-command> <args>
~~~
+### Java Meta
+
+The meta command prints the metadata about the given ORC file and is
+equivalent to the Hive ORC File Dump command.
+
-j
: format the output in JSON
@@ -114,7 +128,7 @@ The org.apache.orc.tools.FileDump Java class, which is available via Hive as:
An example of the output is given below:
~~~ shell
-% java -jar orc-tools-*.jar meta examples/TestOrcFile.test1.orc
+% java -jar orc-tools-X.Y.Z-uber.jar meta examples/TestOrcFile.test1.orc
Processing data file examples/TestOrcFile.test1.orc [length: 1711]
Structure for examples/TestOrcFile.test1.orc
File Version: 0.12 with HIVE_8732
@@ -261,4 +275,49 @@ File length: 1711 bytes
Padding length: 0 bytes
Padding ratio: 0%
______________________________________________________________________
-~~~
\ No newline at end of file
+~~~
+
+### Java Data
+
+The data command prints the data in an ORC file as a JSON document. Each
+record is printed as a JSON object on a line. Each record is annotated with
+the fieldnames and a JSON representation that depends on the field's type.
+
+### Java Scan
+
+The scan command reads the contents of the file without printing anything. It
+is primarily intendend for benchmarking the Java reader without including the
+cost of printing the data out.
+
+### Java Convert
+
+The convert command reads several JSON files and converts them into a
+single ORC file.
+
+-o <filename>
+ : Sets the output ORC filename, which defaults to output.orc
+
+-s <schema>
+ : Sets the schema for the ORC file. By default, the schema is automatically discovered.
+
+-h
+ : Print help
+
+The automatic JSON schema discovery is equivalent to the json-schema tool
+below.
+
+### Java JSON Schema
+
+The JSON Schema discovery tool processes a set of JSON documents and
+produces a schema that encompasses all of the records in all of the
+documents. It works by computing the enclosing type and promoting it
+to include all of the observed values.
+
+-f
+ : Print the schema as a list of flat types for each subfield
+
+-t
+ : Print the schema as a Hive table declaration
+
+-h
+ : Print help
\ No newline at end of file