You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Maja Kabiljo <ma...@fb.com> on 2013/04/21 19:40:26 UTC

Review Request: GIRAPH-648: Allow IO formats to add parameters to Configuration

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10690/
-----------------------------------------------------------

Review request for giraph.


Description
-------

Currently we heavily rely on some runners (HCatGiraphRunner and HiveGiraphRunner) to prepare Configuration before application starts, and we have no way of using hcat/hive io without these runners. It would be better and more flexible if io formats would add what's needed for underlying io to Configuration themselves.

Unfortunately this is not as straightforward as it sounds, because methods from io formats, readers/writers/OutputCommitter have JobContext or TaskAttemptContext as an argument, and in some cases those hold the copy of Configuration, not the original. So I added a way to track which parameters where added to GiraphConfiguration, and wrapped all io related calls to append those parameters to JobContext/TaskAttemptContext before passing control to actual io formats.

Cleaned up HiveGiraphRunner and moved all control to its io formats, I can do similar for HCatalog in a separate patch.

This will also help us do GIRAPH-639 in a cleaner way, and it will actually be possible to mix different kind of input formats (hcat, hive, hbase, or whatever).


This addresses bug GIRAPH-648.
    https://issues.apache.org/jira/browse/GIRAPH-648


Diffs
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/BspOutputFormat.java 574895c 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 7f9e38e 
  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 8dfe546 
  giraph-core/src/main/java/org/apache/giraph/io/EdgeInputFormat.java 43cc7be 
  giraph-core/src/main/java/org/apache/giraph/io/VertexInputFormat.java b3f234f 
  giraph-core/src/main/java/org/apache/giraph/io/VertexOutputFormat.java 71eb665 
  giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedEdgeInputFormat.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedVertexInputFormat.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedVertexOutputFormat.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/internal/package-info.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/superstep_output/MultiThreadedSuperstepOutput.java af086e1 
  giraph-core/src/main/java/org/apache/giraph/io/superstep_output/SynchronizedSuperstepOutput.java 2a7af29 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java d01dbb4 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 037cdfc 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java afb636b 
  giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java c426032 
  giraph-examples/src/test/java/org/apache/giraph/TestBspBasic.java e034b2f 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6e40b7f 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java f8363b1 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 892d443 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java c482cf0 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java 097aeef 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java 45c9ca3 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java 0215428 

Diff: https://reviews.apache.org/r/10690/diff/


Testing
-------

mvn clean verify
Real application run with hive io


Thanks,

Maja Kabiljo


Re: Review Request: GIRAPH-648: Allow IO formats to add parameters to Configuration

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10690/#review19593
-----------------------------------------------------------

Ship it!


Ship It!

- Nitay Joffe


On April 21, 2013, 5:40 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10690/
> -----------------------------------------------------------
> 
> (Updated April 21, 2013, 5:40 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Currently we heavily rely on some runners (HCatGiraphRunner and HiveGiraphRunner) to prepare Configuration before application starts, and we have no way of using hcat/hive io without these runners. It would be better and more flexible if io formats would add what's needed for underlying io to Configuration themselves.
> 
> Unfortunately this is not as straightforward as it sounds, because methods from io formats, readers/writers/OutputCommitter have JobContext or TaskAttemptContext as an argument, and in some cases those hold the copy of Configuration, not the original. So I added a way to track which parameters where added to GiraphConfiguration, and wrapped all io related calls to append those parameters to JobContext/TaskAttemptContext before passing control to actual io formats.
> 
> Cleaned up HiveGiraphRunner and moved all control to its io formats, I can do similar for HCatalog in a separate patch.
> 
> This will also help us do GIRAPH-639 in a cleaner way, and it will actually be possible to mix different kind of input formats (hcat, hive, hbase, or whatever).
> 
> 
> This addresses bug GIRAPH-648.
>     https://issues.apache.org/jira/browse/GIRAPH-648
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspOutputFormat.java 574895c 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 7f9e38e 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 8dfe546 
>   giraph-core/src/main/java/org/apache/giraph/io/EdgeInputFormat.java 43cc7be 
>   giraph-core/src/main/java/org/apache/giraph/io/VertexInputFormat.java b3f234f 
>   giraph-core/src/main/java/org/apache/giraph/io/VertexOutputFormat.java 71eb665 
>   giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedEdgeInputFormat.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedVertexInputFormat.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedVertexOutputFormat.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/internal/package-info.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/superstep_output/MultiThreadedSuperstepOutput.java af086e1 
>   giraph-core/src/main/java/org/apache/giraph/io/superstep_output/SynchronizedSuperstepOutput.java 2a7af29 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java d01dbb4 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 037cdfc 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java afb636b 
>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java c426032 
>   giraph-examples/src/test/java/org/apache/giraph/TestBspBasic.java e034b2f 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6e40b7f 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java f8363b1 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 892d443 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java c482cf0 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java 097aeef 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java 45c9ca3 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java 0215428 
> 
> Diff: https://reviews.apache.org/r/10690/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> Real application run with hive io
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>