You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by zi...@apache.org on 2018/06/04 15:35:52 UTC

[parquet-mr] branch master updated: PARQUET-1311: Update README.md (#487)

This is an automated email from the ASF dual-hosted git repository.

zivanfi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-mr.git


The following commit(s) were added to refs/heads/master by this push:
     new aed9097  PARQUET-1311: Update README.md (#487)
aed9097 is described below

commit aed9097640c7adffe1151b32e86b5efc3702c657
Author: nandorKollar <na...@users.noreply.github.com>
AuthorDate: Mon Jun 4 17:35:47 2018 +0200

    PARQUET-1311: Update README.md (#487)
    
    parquet-mr documentation was not up to date:
    - pointed to broken URLs
    - instructed to install old Thrift version
    - current version was stated as 1.8.1, although 1.10.0 is already released
---
 README.md     | 86 ++++++++++++++++++++++++++++-------------------------------
 dev/README.md |  4 +--
 2 files changed, 43 insertions(+), 47 deletions(-)

diff --git a/README.md b/README.md
index f084f50..4b6b96a 100644
--- a/README.md
+++ b/README.md
@@ -20,9 +20,9 @@
 Parquet MR [![Build Status](https://travis-ci.org/apache/parquet-mr.svg?branch=master)](http://travis-ci.org/apache/parquet-mr)
 ======
 
-Parquet-MR contains the java implementation of the [Parquet format](https://github.com/apache/parquet-format). 
+Parquet-MR contains the java implementation of the [Parquet format](https://github.com/apache/parquet-format).
 Parquet is a columnar storage format for Hadoop; it provides efficient storage and encoding of data.
-Parquet uses the [record shredding and assembly algorithm](https://github.com/Parquet/parquet-mr/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper) described in the Dremel paper to represent nested structures.
+Parquet uses the [record shredding and assembly algorithm](https://github.com/julienledem/redelm/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper) described in the Dremel paper to represent nested structures.
 
 You can find some details about the format and intended use cases in our [Hadoop Summit 2013 presentation](http://www.slideshare.net/julienledem/parquet-hadoop-summit-2013)
 
@@ -49,11 +49,11 @@ sudo ldconfig
 To build and install the thrift compiler, run:
 
 ```
-wget -nv http://archive.apache.org/dist/thrift/0.7.0/thrift-0.7.0.tar.gz
-tar xzf thrift-0.7.0.tar.gz
-cd thrift-0.7.0
+wget -nv http://archive.apache.org/dist/thrift/0.9.3/thrift-0.9.3.tar.gz
+tar xzf thrift-0.9.3.tar.gz
+cd thrift-0.9.3
 chmod +x ./configure
-./configure --disable-gen-erl --disable-gen-hs --without-ruby --without-haskell --without-erlang
+./configure --disable-gen-erl --disable-gen-hs --without-ruby --without-haskell --without-erlang --without-php --without-nodejs
 sudo make install
 ```
 
@@ -67,31 +67,29 @@ LC_ALL=C mvn clean install
 
 ## Features
 
-Parquet is a very active project, and new features are being added quickly; below is the state as of June 2013.
-
-
-<table>
-  <tr><th>Feature</th><th>In trunk</th><th>In dev</th><th>Planned</th><th>Expected release</th></tr>
-  <tr><td>Type-specific encoding</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Hive integration</td><td>YES (<a href ="https://github.com/Parquet/parquet-mr/pull/28">28</a>)</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Pig integration</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Cascading integration</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Crunch integration</td><td>YES (<a href ="https://issues.apache.org/jira/browse/CRUNCH-277">CRUNCH-277</a>)</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Impala integration</td><td>YES (non-nested)</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Java Map/Reduce API</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Native Avro support</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Native Thrift support</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Complex structure support</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Future-proofed versioning</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>RLE</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Bit Packing</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Adaptive dictionary encoding</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Predicate pushdown</td><td>YES (<a href ="https://github.com/Parquet/parquet-mr/pull/68">68</a>)</td><td></td></td><td></td><td>1.0</td></tr>
-  <tr><td>Column stats</td><td>YES</td><td></td></td><td></td><td>2.0</td></tr>  
-  <tr><td>Delta encoding</td><td>YES</td><td></td></td><td></td><td>2.0</td></tr>
-  <tr><td>Native Protocol Buffers support</td><td>YES</td><td></td><td></td><td>1.0</td></tr>
-  <tr><td>Index pages</td><td></td><td></td></td><td>YES</td><td>2.0</td></tr>
-</table>
+Parquet is a very active project, and new features are being added quickly. Here are a few features:
+
+
+* Type-specific encoding
+* Hive integration
+* Pig integration
+* Cascading integration
+* Crunch integration
+* Apache Arrow integration
+* Apache Scrooge integration
+* Impala integration (non-nested)
+* Java Map/Reduce API
+* Native Avro support
+* Native Thrift support
+* Native Protocol Buffers support
+* Complex structure support
+* Run-length encoding (RLE)
+* Bit Packing
+* Adaptive dictionary encoding
+* Predicate pushdown
+* Column stats
+* Delta encoding
+* Index pages
 
 ## Map/Reduce integration
 
@@ -138,46 +136,44 @@ Hive integration is provided via the [parquet-hive](https://github.com/apache/pa
 
 ## Build
 
-to run the unit tests:
-mvn test
+To run the unit tests: `mvn test`
 
-to build the jars:
-mvn package
+To build the jars: `mvn package`
 
 The build runs in [Travis CI](http://travis-ci.org/apache/parquet-mr):
 [![Build Status](https://travis-ci.org/apache/parquet-mr.svg?branch=master)](http://travis-ci.org/apache/parquet-mr)
 
 ## Add Parquet as a dependency in Maven
-The current release is version `1.8.1`
+The current release is version `1.10.0`
 
 ```xml
   <dependencies>
     <dependency>
       <groupId>org.apache.parquet</groupId>
       <artifactId>parquet-common</artifactId>
-      <version>1.8.1</version>
+      <version>1.10.0</version>
     </dependency>
     <dependency>
       <groupId>org.apache.parquet</groupId>
       <artifactId>parquet-encoding</artifactId>
-      <version>1.8.1</version>
+      <version>1.10.0</version>
     </dependency>
     <dependency>
       <groupId>org.apache.parquet</groupId>
       <artifactId>parquet-column</artifactId>
-      <version>1.8.1</version>
+      <version>1.10.0</version>
     </dependency>
     <dependency>
       <groupId>org.apache.parquet</groupId>
       <artifactId>parquet-hadoop</artifactId>
-      <version>1.8.1</version>
+      <version>1.10.0</version>
     </dependency>
   </dependencies>
 ```
 
 ### How To Contribute
 
-We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [github.com/apache/parquet-mr](https://github.com/apache/parquet-mr) repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to https://github.com/apache/parquet-mr.git
+We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [parquet-mr](https://github.com/apache/parquet-mr) Git repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to https://github.com/apache/parquet-mr.git
 
 If you are looking for some ideas on what to contribute, check out jira issues for this project labeled ["pick-me-up"](https://issues.apache.org/jira/browse/PARQUET-5?jql=project%20%3D%20PARQUET%20and%20labels%20%3D%20pick-me-up%20and%20status%20%3D%20open).
 Comment on the issue and/or contact [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/) with your questions and ideas.
@@ -189,8 +185,8 @@ To contribute a patch:
   1. Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
   2. Create a JIRA for your patch on the [Parquet Project JIRA](https://issues.apache.org/jira/browse/PARQUET).
   3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: https://github.com/apache/parquet-mr/pull/240).
-  4. Make sure that your code passes the unit tests. You can run the tests with `mvn test` in the root directory. 
-  5. Add new unit tests for your code. 
+  4. Make sure that your code passes the unit tests. You can run the tests with `mvn test` in the root directory.
+  5. Add new unit tests for your code.
 
 We tend to do fairly close readings of pull requests, and you may get a lot of comments. Some common issues that are not code structure related, but still important:
   * Use 2 spaces for whitespace. Not tabs, not 4 spaces. The number of the spacing shall be 2.
@@ -212,11 +208,11 @@ We hold ourselves and the Parquet developer community to two codes of conduct:
   2. [The Twitter OSS Code of Conduct](https://github.com/twitter/code-of-conduct/blob/master/code-of-conduct.md)
 
 ## Discussions
-* Mailing list: [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/) 
+* Mailing list: [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/)
 * Bug trackter: [jira](https://issues.apache.org/jira/browse/PARQUET)
 * Discussions also take place in github pull requests
 
 ## License
 
 Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
-See also: 
+See also:
diff --git a/dev/README.md b/dev/README.md
index 8fe30e0..b984b11 100644
--- a/dev/README.md
+++ b/dev/README.md
@@ -27,7 +27,7 @@ Merging a pull request requires being a committer on the project.
 have an apache and apache-github remote setup
 ```
 git remote add apache-github https://github.com/apache/parquet-mr.git
-git remote add apache https://git-wip-us.apache.org/repos/asf/parquet-mr.git
+git remote add apache https://gitbox.apache.org/repos/asf?p=parquet-mr.git
 ```
 run the following command
 ```
@@ -50,7 +50,7 @@ source	repo/branch
 target	master
 url	https://api.github.com/repos/apache/parquet-mr/pulls/X
 
-Proceed with merging pull request #3? (y/n): 
+Proceed with merging pull request #3? (y/n):
 ```
 If this looks good, type y and hit enter.
 ```

-- 
To stop receiving notification emails like this one, please contact
zivanfi@apache.org.