You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Maja Kabiljo <ma...@fb.com> on 2013/04/17 00:42:55 UTC

Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/
-----------------------------------------------------------

Review request for giraph.


Description
-------

For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.


This addresses bug GIRAPH-639.
    https://issues.apache.org/jira/browse/GIRAPH-639


Diffs
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 178c96f 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 4a0e8f7 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 7f9e38e 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 6a5949e 
  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 8dfe546 
  giraph-core/src/main/java/org/apache/giraph/io/InputFormatWithIndex.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/job/GiraphConfigurationValidator.java cc6b126 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java d01dbb4 
  giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 029cb5d 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 037cdfc 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java afb636b 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6e40b7f 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java f8363b1 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 892d443 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java c482cf0 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java 09476cd 

Diff: https://reviews.apache.org/r/10564/diff/


Testing
-------

mvn clean verify
Run application with two edge input tables - verified results.


Thanks,

Maja Kabiljo


Re: Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/#review19954
-----------------------------------------------------------


This is looking much cleaner now.


giraph-core/src/main/java/org/apache/giraph/io/formats/multi/EdgeInputFormatDescription.java
<https://reviews.apache.org/r/10564/#comment41184>

    I would specify in the javadoc that this is only relevant to multiple inputs (I know it's in "multi", but it would make it clearer).



giraph-core/src/main/java/org/apache/giraph/io/formats/multi/EdgeInputFormatDescription.java
<https://reviews.apache.org/r/10564/#comment41187>

    "ret" is not a great name for a variable.
    How about "edgeInputFormats"?



giraph-core/src/main/java/org/apache/giraph/io/formats/multi/InputFormatDescription.java
<https://reviews.apache.org/r/10564/#comment41185>

    Same as for EdgeInputFormatDescription.



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/10564/#comment41188>

    I think this multitude of comma-separated options, where order is relevant, is not that robust.
    I'd prefer a representation with tuples (format, table, filter) akin to what you did with the multi-input description in giraph-core. It doesn't have to be one JSON object. You could make it so that an arbitrary number of inputs can be specified. Example:
    
    -edgeInput (MyFormat, my_table, my_partition)
    -edgeInput (MyOtherFormat, my_other_table, my_other_partition)
    ...


- Alessandro Presta


On April 30, 2013, 7:05 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10564/
> -----------------------------------------------------------
> 
> (Updated April 30, 2013, 7:05 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
> Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.
> 
> 
> This addresses bug GIRAPH-639.
>     https://issues.apache.org/jira/browse/GIRAPH-639
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 795cd0c 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java f5a926f 
>   giraph-core/src/main/java/org/apache/giraph/io/EdgeInputFormat.java 2aac1f0 
>   giraph-core/src/main/java/org/apache/giraph/io/GiraphInputFormat.java 13efc93 
>   giraph-core/src/main/java/org/apache/giraph/io/VertexInputFormat.java c4d7fe2 
>   giraph-core/src/main/java/org/apache/giraph/io/formats/multi/EdgeInputFormatDescription.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/formats/multi/InputFormatDescription.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/formats/multi/MultiEdgeInputFormat.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/formats/multi/package-info.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedEdgeInputFormat.java 9b14727 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 2c1a679 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 01937ab 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java de1e774 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java a3a9ab7 
>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java 224856e 
>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 4eff3b8 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 8d67c1d 
> 
> Diff: https://reviews.apache.org/r/10564/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> Run application with two edge input tables - verified results.
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/
-----------------------------------------------------------

(Updated April 30, 2013, 7:05 a.m.)


Review request for giraph.


Changes
-------

Updated per offline discussion with Alessandro - instead of having infrastructure handle several input formats, we can create special input format which wraps several input formats. Some things from the infrastructure needed to be changed in order to make this work, like moving the read/write InputSplit functionality to input formats. The regular use case is now unchanged, HiveGiraphRunner has direct support for several input formats.


Description
-------

For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.


This addresses bug GIRAPH-639.
    https://issues.apache.org/jira/browse/GIRAPH-639


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 795cd0c 
  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java f5a926f 
  giraph-core/src/main/java/org/apache/giraph/io/EdgeInputFormat.java 2aac1f0 
  giraph-core/src/main/java/org/apache/giraph/io/GiraphInputFormat.java 13efc93 
  giraph-core/src/main/java/org/apache/giraph/io/VertexInputFormat.java c4d7fe2 
  giraph-core/src/main/java/org/apache/giraph/io/formats/multi/EdgeInputFormatDescription.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/formats/multi/InputFormatDescription.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/formats/multi/MultiEdgeInputFormat.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/formats/multi/package-info.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedEdgeInputFormat.java 9b14727 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 2c1a679 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 01937ab 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java de1e774 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
  giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java a3a9ab7 
  giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java 224856e 
  giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 4eff3b8 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 8d67c1d 

Diff: https://reviews.apache.org/r/10564/diff/


Testing
-------

mvn clean verify
Run application with two edge input tables - verified results.


Thanks,

Maja Kabiljo


Re: Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/
-----------------------------------------------------------

(Updated April 25, 2013, 12:27 a.m.)


Review request for giraph.


Changes
-------

I revisited my approach - instead of each input format having to know its index and having configuration parameters set based on that index, I am putting all input format specific settings inside of one Configuration parameter, in JSON format, like
giraph.edgeInputFormatDescriptions=[["EIF1",{"p":"v1"}],["EIF2",{"p":"v2","q":"v"}]]
This would mean that we are using two edge input formats - EIF1 and EIF2. What happens in the infrastructure is that when EIF1 is created we also create a copy of Configuration in which we set p=v1, 
and for EIF2 p=v2 and q=v. This means that input formats can just call conf.get("p") and they will be getting the parameters they need - they don't need to know anything about existence of other input formats or their index.

Again I did only edge input part, and added support only in HiveGiraphRunner, hoping for constructive feedback before proceeding :-)


Description
-------

For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.


This addresses bug GIRAPH-639.
    https://issues.apache.org/jira/browse/GIRAPH-639


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 178c96f 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 4a0e8f7 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 795cd0c 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 6a5949e 
  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java f5a926f 
  giraph-core/src/main/java/org/apache/giraph/io/internal/EdgeInputFormatDescription.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/internal/InputFormatDescription.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/job/GiraphConfigurationValidator.java cc6b126 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 2c1a679 
  giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 029cb5d 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 01937ab 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java de1e774 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 8d67c1d 

Diff: https://reviews.apache.org/r/10564/diff/


Testing
-------

mvn clean verify
Run application with two edge input tables - verified results.


Thanks,

Maja Kabiljo


Re: Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

Posted by Maja Kabiljo <ma...@fb.com>.

> On April 17, 2013, 5:11 p.m., Alessandro Presta wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java, line 664
> > <https://reviews.apache.org/r/10564/diff/1-2/?file=281712#file281712line664>
> >
> >     Isn't the VertexInputFormat also always the same? Looks like we should only be logging the HiveToVertex class and input description.
> >     Also, for uniformity with edge input formats, you may want to prepend the line "Vertex input format:".

As I said initially, I did everything just for edge input to get some feedback, once we agree on design I'll go ahead and make changes for vertex input too.


> On April 17, 2013, 5:11 p.m., Alessandro Presta wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java, line 47
> > <https://reviews.apache.org/r/10564/diff/1/?file=281713#file281713line47>
> >
> >     Shouldn't this be "classes" too?

This is going to be used as:
giraph.hive.to.edge.class.0
giraph.hive.to.edge.class.1
and so on, so I think this name is fine.


- Maja


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/#review19322
-----------------------------------------------------------


On April 17, 2013, 4:19 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10564/
> -----------------------------------------------------------
> 
> (Updated April 17, 2013, 4:19 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
> Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.
> 
> 
> This addresses bug GIRAPH-639.
>     https://issues.apache.org/jira/browse/GIRAPH-639
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 178c96f 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 4a0e8f7 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 7f9e38e 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 6a5949e 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 8dfe546 
>   giraph-core/src/main/java/org/apache/giraph/io/InputFormatWithIndex.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphConfigurationValidator.java cc6b126 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java d01dbb4 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 029cb5d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 037cdfc 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java afb636b 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6e40b7f 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java f8363b1 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 892d443 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java c482cf0 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java 09476cd 
> 
> Diff: https://reviews.apache.org/r/10564/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> Run application with two edge input tables - verified results.
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/#review19322
-----------------------------------------------------------



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/10564/#comment39999>

    Isn't the VertexInputFormat also always the same? Looks like we should only be logging the HiveToVertex class and input description.
    Also, for uniformity with edge input formats, you may want to prepend the line "Vertex input format:".



giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java
<https://reviews.apache.org/r/10564/#comment40001>

    Shouldn't this be "classes" too?


- Alessandro Presta


On April 17, 2013, 4:19 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10564/
> -----------------------------------------------------------
> 
> (Updated April 17, 2013, 4:19 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
> Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.
> 
> 
> This addresses bug GIRAPH-639.
>     https://issues.apache.org/jira/browse/GIRAPH-639
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 178c96f 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 4a0e8f7 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 7f9e38e 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 6a5949e 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 8dfe546 
>   giraph-core/src/main/java/org/apache/giraph/io/InputFormatWithIndex.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphConfigurationValidator.java cc6b126 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java d01dbb4 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 029cb5d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 037cdfc 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java afb636b 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6e40b7f 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java f8363b1 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 892d443 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java c482cf0 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java 09476cd 
> 
> Diff: https://reviews.apache.org/r/10564/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> Run application with two edge input tables - verified results.
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/
-----------------------------------------------------------

(Updated April 17, 2013, 4:19 a.m.)


Review request for giraph.


Changes
-------

Alessandro's comments


Description
-------

For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.


This addresses bug GIRAPH-639.
    https://issues.apache.org/jira/browse/GIRAPH-639


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 178c96f 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 4a0e8f7 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 7f9e38e 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 6a5949e 
  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 8dfe546 
  giraph-core/src/main/java/org/apache/giraph/io/InputFormatWithIndex.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/job/GiraphConfigurationValidator.java cc6b126 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java d01dbb4 
  giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 029cb5d 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 037cdfc 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java afb636b 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6e40b7f 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java f8363b1 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 892d443 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java c482cf0 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java 09476cd 

Diff: https://reviews.apache.org/r/10564/diff/


Testing
-------

mvn clean verify
Run application with two edge input tables - verified results.


Thanks,

Maja Kabiljo


Re: Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

Posted by Maja Kabiljo <ma...@fb.com>.

> On April 17, 2013, 12:37 a.m., Alessandro Presta wrote:
> > Can you add a test case with multiple edge input formats?

We have no tests in giraph-hive, and that's the only place where I added a way to set multiple edge inputs. I am not sure it makes sense to add method to InternalVertexRunner which accepts several String[], one for each edge input (and then for vertex input to).


> On April 17, 2013, 12:37 a.m., Alessandro Presta wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java, line 687
> > <https://reviews.apache.org/r/10564/diff/1/?file=281712#file281712line687>
> >
> >     This is going to print "-hiveToEdgeClass=XXX" for multiple classes. I think this is confusing, seems like the same option is overridden multiple times. Probably better to rename the option to "-hiveToEdgeClasses" and print the comma-separated list.

I changed the logging a bit, now it will print this for each edge input:

Edge input format:
   hiveToEdgeClass=org.apache.giraph.hive.SomeInputFormat
   edgeInputTable=table_name
   edgeInputFilter="ds='2013-01-01'"

The whole HiveGiraphRunner is a bit messy, I think adding something like BenchmarkOption stuff I added some time ago could make it much cleaner, but I'll leave that for a separate issue.


> On April 17, 2013, 12:37 a.m., Alessandro Presta wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java, line 86
> > <https://reviews.apache.org/r/10564/diff/1/?file=281713#file281713line86>
> >
> >     Is there a way you can map EdgeInputFormat classes to HiveToEdge classes without going through the index?
> >     The index seems a bit low-level, and also forces Giraph to maintain the ordering.

Can you please give some details about the design which you have in mind?


- Maja


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/#review19285
-----------------------------------------------------------


On April 17, 2013, 4:19 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10564/
> -----------------------------------------------------------
> 
> (Updated April 17, 2013, 4:19 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
> Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.
> 
> 
> This addresses bug GIRAPH-639.
>     https://issues.apache.org/jira/browse/GIRAPH-639
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 178c96f 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 4a0e8f7 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 7f9e38e 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 6a5949e 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 8dfe546 
>   giraph-core/src/main/java/org/apache/giraph/io/InputFormatWithIndex.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphConfigurationValidator.java cc6b126 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java d01dbb4 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 029cb5d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 037cdfc 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java afb636b 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6e40b7f 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java f8363b1 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 892d443 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java c482cf0 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java 09476cd 
> 
> Diff: https://reviews.apache.org/r/10564/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> Run application with two edge input tables - verified results.
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: GIRAPH-639: Add support for multiple Vertex/Edge inputs

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10564/#review19285
-----------------------------------------------------------


Can you add a test case with multiple edge input formats?


giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java
<https://reviews.apache.org/r/10564/#comment39951>

    class->classes



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/10564/#comment39958>

    If this can hold multiple classes, it should be called -hiveToEdgeClasses.



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/10564/#comment39956>

    If this can hold multiple tables, it should be called -edgeInputTables.



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/10564/#comment39957>

    Same here.



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/10564/#comment39959>

    This is going to print "-hiveToEdgeClass=XXX" for multiple classes. I think this is confusing, seems like the same option is overridden multiple times. Probably better to rename the option to "-hiveToEdgeClasses" and print the comma-separated list.



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/10564/#comment39960>

    Why do we print both the HiveToEdge class and the EdgeInputFormat class? Isn't the latter always HiveEdgeInputFormat?



giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java
<https://reviews.apache.org/r/10564/#comment39961>

    Is there a way you can map EdgeInputFormat classes to HiveToEdge classes without going through the index?
    The index seems a bit low-level, and also forces Giraph to maintain the ordering.


- Alessandro Presta


On April 16, 2013, 10:42 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10564/
> -----------------------------------------------------------
> 
> (Updated April 16, 2013, 10:42 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> For now, I did this only for Edge input, once I get some feedback I'll do the exactly same thing for vertex input.
> Also, I added direct support only to HiveGiraphRunner, we can extend it later to others as well.
> 
> 
> This addresses bug GIRAPH-639.
>     https://issues.apache.org/jira/browse/GIRAPH-639
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 178c96f 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 4a0e8f7 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 7f9e38e 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 6a5949e 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 8dfe546 
>   giraph-core/src/main/java/org/apache/giraph/io/InputFormatWithIndex.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphConfigurationValidator.java cc6b126 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java d01dbb4 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 029cb5d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 037cdfc 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java afb636b 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 4a1705b 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6e40b7f 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java f8363b1 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 892d443 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java c482cf0 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java 09476cd 
> 
> Diff: https://reviews.apache.org/r/10564/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> Run application with two edge input tables - verified results.
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>