You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ju...@apache.org on 2014/10/21 18:54:41 UTC
git commit: PARQUET-105: use mvn shade plugin to create uber jar,
support meta on a folder
Repository: incubator-parquet-mr
Updated Branches:
refs/heads/master be1222ef4 -> 31fb4dfef
PARQUET-105: use mvn shade plugin to create uber jar, support meta on a folder
1. Make hadoop dependency from parquet-tools so it is provided. It can be used against different version of hadoop
2. Use maven shade plugin to create a all in one jar, which can be used both locally or in hadoop
3. Make parquet-meta command support both folder(read summary file) and a single file
Author: Tianshuo Deng <td...@twitter.com>
Closes #69 from tsdeng/bundle_parquet_tools and squashes the following commits:
d8dcd3e [Tianshuo Deng] print file offset, file path, and cancel autoCrop
a2d1399 [Tianshuo Deng] support local mode
5009a85 [Tianshuo Deng] fix README
0756f81 [Tianshuo Deng] remove semver check for parquet_tools
78c7f4b [Tianshuo Deng] use mvn shade plugin to create uber jar, support meta on a folder
Project: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/commit/31fb4dfe
Tree: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/tree/31fb4dfe
Diff: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/diff/31fb4dfe
Branch: refs/heads/master
Commit: 31fb4dfef212791f86f052ce8a3adeabaf830cf2
Parents: be1222e
Author: Tianshuo Deng <td...@twitter.com>
Authored: Tue Oct 21 09:54:20 2014 -0700
Committer: julien <ju...@twitter.com>
Committed: Tue Oct 21 09:54:20 2014 -0700
----------------------------------------------------------------------
parquet-tools/README.md | 48 +++++++++++++++++---
parquet-tools/pom.xml | 43 ++++++++++++++----
.../src/main/java/parquet/tools/Main.java | 29 ++++++------
.../parquet/tools/command/ShowMetaCommand.java | 20 +++++---
.../tools/command/ShowSchemaCommand.java | 2 -
.../java/parquet/tools/util/MetadataUtils.java | 3 +-
6 files changed, 104 insertions(+), 41 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/31fb4dfe/parquet-tools/README.md
----------------------------------------------------------------------
diff --git a/parquet-tools/README.md b/parquet-tools/README.md
index e3667aa..2ac54e7 100644
--- a/parquet-tools/README.md
+++ b/parquet-tools/README.md
@@ -6,17 +6,53 @@ in the inspection of [Parquet files](https://github.com/Parquet).
Currently these tools are available for UN*X systems.
-## Usage
+## Build
+
+If you want to use parquet-tools in local mode, you should use the local profile so the
+hadoop client dependency is included.
+
+```sh
+cd parquet-tools && mvn clean package -Plocal
+```
+
+To use it in hadoop mode, the default profile will exclude the hadoop client dependency
+
+```sh
+cd parquet-tools && mvn clean package
+```
+
+The resulting jar is target/parquet-tools-<Version>.jar, you can copy it to the place where you
+want to use it
+
+#Run from hadoop
+
+See Commands Usage for command to use
+
+```sh
+hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet
+```
+
+#Run locally
+
+See Commands Usage for command to use
+
+```
+java jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet
+```
+
+## Commands Usage
+
+To run it on hadoop, you should use "hadoop jar" instead of "java jar"
```sh
-usage: parquet-tools cat [option...] <input>
+usage: java jar ./parquet-tools-<VERSION>.jar cat [option...] <input>
where option is one of:
--debug Disable color output even if supported
-h,--help Show this help string
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
-usage: parquet-tools head [option...] <input>
+usage: java jar ./parquet-tools-<VERSION>.jar head [option...] <input>
where option is one of:
--debug Disable color output even if supported
-h,--help Show this help string
@@ -24,7 +60,7 @@ where option is one of:
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
-usage: parquet-tools schema [option...] <input>
+usage: java jar ./parquet-tools-<VERSION>.jar schema [option...] <input>
where option is one of:
-d,--detailed <arg> Show detailed information about the schema.
--debug Disable color output even if supported
@@ -32,14 +68,14 @@ where option is one of:
--no-color Disable color output even if supported
where <input> is the parquet file containing the schema to show
-usage: parquet-tools meta [option...] <input>
+usage: java jar ./parquet-tools-<VERSION>.jar meta [option...] <input>
where option is one of:
--debug Disable color output even if supported
-h,--help Show this help string
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
-usage: parquet-tools dump [option...] <input>
+usage: java jar dump [option...] <input>
where option is one of:
-c,--column <arg> Dump only the given column, can be specified more than
once
http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/31fb4dfe/parquet-tools/pom.xml
----------------------------------------------------------------------
diff --git a/parquet-tools/pom.xml b/parquet-tools/pom.xml
index ea6b88a..ba784c3 100644
--- a/parquet-tools/pom.xml
+++ b/parquet-tools/pom.xml
@@ -15,8 +15,18 @@
<url>https://github.com/Parquet/parquet-mr</url>
<properties>
+ <hadoop.scope>provided</hadoop.scope>
</properties>
+ <profiles>
+ <profile>
+ <id>local</id>
+ <properties>
+ <hadoop.scope>compile</hadoop.scope>
+ </properties>
+ </profile>
+ </profiles>
+
<dependencies>
<dependency>
<groupId>com.twitter</groupId>
@@ -32,6 +42,7 @@
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
+ <scope>${hadoop.scope}</scope>
</dependency>
<dependency>
<groupId>commons-cli</groupId>
@@ -47,28 +58,40 @@
<build>
<plugins>
+ <!--We do not turn on semver checking for parquet-tools, since it's not considered as an API-->
<plugin>
- <artifactId>maven-enforcer-plugin</artifactId>
- </plugin>
- <plugin>
- <artifactId>maven-assembly-plugin</artifactId>
+ <groupId>org.apache.maven.plugins</groupId>
+ <artifactId>maven-jar-plugin</artifactId>
<configuration>
- <descriptors>
- <descriptor>src/main/assembly/assembly.xml</descriptor>
- </descriptors>
+ <archive>
+ <manifest>
+ <mainClass>parquet.tools.Main</mainClass>
+ </manifest>
+ </archive>
</configuration>
+ </plugin>
+
+ <plugin>
+ <groupId>org.apache.maven.plugins</groupId>
+ <artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
- <id>make-assembly</id>
<phase>package</phase>
<goals>
- <goal>single</goal>
+ <goal>shade</goal>
</goals>
+ <configuration>
+ <minimizeJar>false</minimizeJar>
+ <artifactSet>
+ <includes>
+ <include>*</include>
+ </includes>
+ </artifactSet>
+ </configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
-
</project>
http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/31fb4dfe/parquet-tools/src/main/java/parquet/tools/Main.java
----------------------------------------------------------------------
diff --git a/parquet-tools/src/main/java/parquet/tools/Main.java b/parquet-tools/src/main/java/parquet/tools/Main.java
index e644f3c..e72c999 100644
--- a/parquet-tools/src/main/java/parquet/tools/Main.java
+++ b/parquet-tools/src/main/java/parquet/tools/Main.java
@@ -165,21 +165,20 @@ public class Main {
Main.out = System.out;
Main.err = System.err;
- System.setOut(new PrintStream(new OutputStream() {
- @Override public void write(int b) throws IOException { }
- @Override public void write(byte[] b) throws IOException { }
- @Override public void write(byte[] b, int off, int len) throws IOException { }
- @Override public void flush() throws IOException { }
- @Override public void close() throws IOException { }
- }));
-
- System.setErr(new PrintStream(new OutputStream() {
- @Override public void write(int b) throws IOException { }
- @Override public void write(byte[] b) throws IOException { }
- @Override public void write(byte[] b, int off, int len) throws IOException { }
- @Override public void flush() throws IOException { }
- @Override public void close() throws IOException { }
- }));
+ PrintStream VoidStream = new PrintStream(new OutputStream() {
+ @Override
+ public void write(int b) throws IOException {}
+ @Override
+ public void write(byte[] b) throws IOException {}
+ @Override
+ public void write(byte[] b, int off, int len) throws IOException {}
+ @Override
+ public void flush() throws IOException {}
+ @Override
+ public void close() throws IOException {}
+ });
+ System.setOut(VoidStream);
+ System.setErr(VoidStream);
if (args.length == 0) {
die("No command specified", true, null, null);
http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/31fb4dfe/parquet-tools/src/main/java/parquet/tools/command/ShowMetaCommand.java
----------------------------------------------------------------------
diff --git a/parquet-tools/src/main/java/parquet/tools/command/ShowMetaCommand.java b/parquet-tools/src/main/java/parquet/tools/command/ShowMetaCommand.java
index 0e18456..106a67d 100644
--- a/parquet-tools/src/main/java/parquet/tools/command/ShowMetaCommand.java
+++ b/parquet-tools/src/main/java/parquet/tools/command/ShowMetaCommand.java
@@ -15,21 +15,23 @@
*/
package parquet.tools.command;
-import java.io.PrintWriter;
+import static parquet.format.converter.ParquetMetadataConverter.NO_FILTER;
import org.apache.commons.cli.CommandLine;
import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
+import parquet.hadoop.Footer;
import parquet.hadoop.ParquetFileReader;
import parquet.hadoop.metadata.ParquetMetadata;
import parquet.tools.util.MetadataUtils;
import parquet.tools.util.PrettyPrintWriter;
import parquet.tools.util.PrettyPrintWriter.WhiteSpaceHandler;
+import java.util.List;
+
public class ShowMetaCommand extends ArgsOnlyCommand {
- public static final String TABS = " ";
- public static final int BLOCK_BUFFER_SIZE = 64 * 1024;
public static final String[] USAGE = new String[] {
"<input>",
"where <input> is the parquet file to print to stdout"
@@ -52,16 +54,20 @@ public class ShowMetaCommand extends ArgsOnlyCommand {
String input = args[0];
Configuration conf = new Configuration();
- ParquetMetadata metaData = ParquetFileReader.readFooter(conf, new Path(input));
+ Path inputPath = new Path(input);
+ FileStatus inputFileStatus = inputPath.getFileSystem(conf).getFileStatus(inputPath);
+ List<Footer> footers = ParquetFileReader.readFooters(conf, inputFileStatus, false);
PrettyPrintWriter out = PrettyPrintWriter.stdoutPrettyPrinter()
.withAutoColumn()
- .withAutoCrop()
.withWhitespaceHandler(WhiteSpaceHandler.COLLAPSE_WHITESPACE)
.withColumnPadding(1)
.build();
- MetadataUtils.showDetails(out, metaData);
- out.flushColumns();
+ for(Footer f: footers) {
+ out.format("file: %s%n" , f.getFile());
+ MetadataUtils.showDetails(out, f.getParquetMetadata());
+ out.flushColumns();
+ }
}
}
http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/31fb4dfe/parquet-tools/src/main/java/parquet/tools/command/ShowSchemaCommand.java
----------------------------------------------------------------------
diff --git a/parquet-tools/src/main/java/parquet/tools/command/ShowSchemaCommand.java b/parquet-tools/src/main/java/parquet/tools/command/ShowSchemaCommand.java
index 1a8963c..c5c412d 100644
--- a/parquet-tools/src/main/java/parquet/tools/command/ShowSchemaCommand.java
+++ b/parquet-tools/src/main/java/parquet/tools/command/ShowSchemaCommand.java
@@ -32,8 +32,6 @@ import parquet.tools.util.MetadataUtils;
import parquet.tools.util.PrettyPrintWriter;
public class ShowSchemaCommand extends ArgsOnlyCommand {
- public static final DecimalFormat FRACTIONAL = new DecimalFormat("#,##0.##");
- public static final DecimalFormat WHOLE = new DecimalFormat("#,##0");
public static final String[] USAGE = new String[] {
"<input>",
"where <input> is the parquet file containing the schema to show"
http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/31fb4dfe/parquet-tools/src/main/java/parquet/tools/util/MetadataUtils.java
----------------------------------------------------------------------
diff --git a/parquet-tools/src/main/java/parquet/tools/util/MetadataUtils.java b/parquet-tools/src/main/java/parquet/tools/util/MetadataUtils.java
index e4285b0..494fc8b 100644
--- a/parquet-tools/src/main/java/parquet/tools/util/MetadataUtils.java
+++ b/parquet-tools/src/main/java/parquet/tools/util/MetadataUtils.java
@@ -79,8 +79,9 @@ public class MetadataUtils {
private static void showDetails(PrettyPrintWriter out, BlockMetaData meta, Long num) {
long rows = meta.getRowCount();
long tbs = meta.getTotalByteSize();
+ long offset = meta.getStartingPos();
- out.format("row group%s: RC:%d TS:%d%n", (num == null ? "" : " " + num), rows, tbs);
+ out.format("row group%s: RC:%d TS:%d OFFSET:%d%n", (num == null ? "" : " " + num), rows, tbs, offset);
out.rule('-');
showDetails(out, meta.getColumns());
}