You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Alessandro Presta <al...@fb.com> on 2012/11/13 04:46:14 UTC

Review Request: Edge-based input from HCatalog

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8034/
-----------------------------------------------------------

Review request for giraph.


Description
-------

Implemented Edge/VertexValue input formats for HCatalog.
Unfortunately, I had to pretty much copy some functionality from HCatalog because of privacy restrictions, and add our own GiraphHCatInputFormat (much like with GiraphFileInputFormat).
Also, I had to make HiveGiraphRunner a little less type-safe, because HCatalogVertexValueInputFormat is not a subclass of HCatalogVertexInputFormat. If this is an issue, I can add an empty interface to address it.


This addresses bug GIRAPH-405.
    https://issues.apache.org/jira/browse/GIRAPH-405


Diffs
-----

  /trunk/giraph-formats-contrib/pom.xml 1406239 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogEdgeInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java 1406239 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexOutputFormat.java 1406239 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexValueInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 1406239 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/package-info.java PRE-CREATION 
  /trunk/pom.xml 1406239 

Diff: https://reviews.apache.org/r/8034/diff/


Testing
-------

- mvn verify
- tested on a real application that runs on top of Hive


Thanks,

Alessandro Presta


Re: Review Request: Edge-based input from HCatalog

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8034/#review13587
-----------------------------------------------------------



/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java
<https://reviews.apache.org/r/8034/#comment29156>

    yeah it's really unfortunate we have to do this - but hopefully sooner it will be cleaner as things progress



/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java
<https://reviews.apache.org/r/8034/#comment29154>

    why do you need this and the method above, just have the Configuration one alone.



/trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java
<https://reviews.apache.org/r/8034/#comment29158>

    Can we make all the parts of this diff where you copy code into TODOs and file JIRAs with HCatalog to make the appropriate changes? We should track that these are copied over and work towards remove them as opposed to letting them pollute our code.



/trunk/pom.xml
<https://reviews.apache.org/r/8034/#comment29153>

    I think I do this in one of my diffs, in case not - can you make some variable like hive.version to use here everywhere?


- Nitay Joffe


On Nov. 13, 2012, 3:46 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8034/
> -----------------------------------------------------------
> 
> (Updated Nov. 13, 2012, 3:46 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Implemented Edge/VertexValue input formats for HCatalog.
> Unfortunately, I had to pretty much copy some functionality from HCatalog because of privacy restrictions, and add our own GiraphHCatInputFormat (much like with GiraphFileInputFormat).
> Also, I had to make HiveGiraphRunner a little less type-safe, because HCatalogVertexValueInputFormat is not a subclass of HCatalogVertexInputFormat. If this is an issue, I can add an empty interface to address it.
> 
> 
> This addresses bug GIRAPH-405.
>     https://issues.apache.org/jira/browse/GIRAPH-405
> 
> 
> Diffs
> -----
> 
>   /trunk/giraph-formats-contrib/pom.xml 1406239 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogEdgeInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java 1406239 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexOutputFormat.java 1406239 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexValueInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 1406239 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/package-info.java PRE-CREATION 
>   /trunk/pom.xml 1406239 
> 
> Diff: https://reviews.apache.org/r/8034/diff/
> 
> 
> Testing
> -------
> 
> - mvn verify
> - tested on a real application that runs on top of Hive
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Edge-based input from HCatalog

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8034/#review13608
-----------------------------------------------------------


+1

- Nitay Joffe


On Nov. 20, 2012, 12:11 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8034/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2012, 12:11 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Edge-based input from HCatalog
> 
> 
> This addresses bug GIRAPH-405.
>     https://issues.apache.org/jira/browse/GIRAPH-405
> 
> 
> Diffs
> -----
> 
>   /trunk/giraph-formats-contrib/pom.xml 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogEdgeInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexOutputFormat.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexValueInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/package-info.java PRE-CREATION 
>   /trunk/pom.xml 1410684 
> 
> Diff: https://reviews.apache.org/r/8034/diff/
> 
> 
> Testing
> -------
> 
> - mvn verify
> - tested on a real application that runs on top of Hive
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Edge-based input from HCatalog

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8034/
-----------------------------------------------------------

(Updated Nov. 20, 2012, 12:11 a.m.)


Review request for giraph.


Changes
-------

Avery's comments.


Description
-------

Edge-based input from HCatalog


This addresses bug GIRAPH-405.
    https://issues.apache.org/jira/browse/GIRAPH-405


Diffs (updated)
-----

  /trunk/giraph-formats-contrib/pom.xml 1410684 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogEdgeInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java 1410684 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexOutputFormat.java 1410684 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexValueInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 1410684 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/package-info.java PRE-CREATION 
  /trunk/pom.xml 1410684 

Diff: https://reviews.apache.org/r/8034/diff/


Testing
-------

- mvn verify
- tested on a real application that runs on top of Hive


Thanks,

Alessandro Presta


Re: Review Request: Edge-based input from HCatalog

Posted by Alessandro Presta <al...@fb.com>.

> On Nov. 19, 2012, 11:08 p.m., Avery Ching wrote:
> > /trunk/pom.xml, lines 791-800
> > <https://reviews.apache.org/r/8034/diff/1/?file=188999#file188999line791>
> >
> >     Just to verify, this doesn't include the hive jars in giraph, only the giraph-contrib-formats?

Yes, this just declares the package versions. Only giraph-formats-contrib has Hive has a dependency.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8034/#review13580
-----------------------------------------------------------


On Nov. 20, 2012, 12:11 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8034/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2012, 12:11 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Edge-based input from HCatalog
> 
> 
> This addresses bug GIRAPH-405.
>     https://issues.apache.org/jira/browse/GIRAPH-405
> 
> 
> Diffs
> -----
> 
>   /trunk/giraph-formats-contrib/pom.xml 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogEdgeInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexOutputFormat.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexValueInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/package-info.java PRE-CREATION 
>   /trunk/pom.xml 1410684 
> 
> Diff: https://reviews.apache.org/r/8034/diff/
> 
> 
> Testing
> -------
> 
> - mvn verify
> - tested on a real application that runs on top of Hive
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Edge-based input from HCatalog

Posted by Nitay Joffe <ni...@apache.org>.

> On Nov. 19, 2012, 11:08 p.m., Avery Ching wrote:
> > /trunk/pom.xml, lines 791-800
> > <https://reviews.apache.org/r/8034/diff/1/?file=188999#file188999line791>
> >
> >     Just to verify, this doesn't include the hive jars in giraph, only the giraph-contrib-formats?
> 
> Alessandro Presta wrote:
>     Yes, this just declares the package versions. Only giraph-formats-contrib has Hive has a dependency.

FYI you can always run mvn dependency:tree which gives a nice dump of exactly what's going in.


- Nitay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8034/#review13580
-----------------------------------------------------------


On Nov. 20, 2012, 12:11 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8034/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2012, 12:11 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Edge-based input from HCatalog
> 
> 
> This addresses bug GIRAPH-405.
>     https://issues.apache.org/jira/browse/GIRAPH-405
> 
> 
> Diffs
> -----
> 
>   /trunk/giraph-formats-contrib/pom.xml 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogEdgeInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexOutputFormat.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexValueInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/package-info.java PRE-CREATION 
>   /trunk/pom.xml 1410684 
> 
> Diff: https://reviews.apache.org/r/8034/diff/
> 
> 
> Testing
> -------
> 
> - mvn verify
> - tested on a real application that runs on top of Hive
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Edge-based input from HCatalog

Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8034/#review13580
-----------------------------------------------------------


Couple of extra comments above Nitay's.  Looks good though.


/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java
<https://reviews.apache.org/r/8034/#comment29144>

    Perhaps remove (malewicz)?



/trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java
<https://reviews.apache.org/r/8034/#comment29170>

    missing space



/trunk/pom.xml
<https://reviews.apache.org/r/8034/#comment29143>

    Just to verify, this doesn't include the hive jars in giraph, only the giraph-contrib-formats?


- Avery Ching


On Nov. 19, 2012, 10:28 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8034/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2012, 10:28 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Edge-based input from HCatalog
> 
> 
> This addresses bug GIRAPH-405.
>     https://issues.apache.org/jira/browse/GIRAPH-405
> 
> 
> Diffs
> -----
> 
>   /trunk/giraph-formats-contrib/pom.xml 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogEdgeInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexOutputFormat.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexValueInputFormat.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 1410684 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java PRE-CREATION 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/package-info.java PRE-CREATION 
>   /trunk/pom.xml 1410684 
> 
> Diff: https://reviews.apache.org/r/8034/diff/
> 
> 
> Testing
> -------
> 
> - mvn verify
> - tested on a real application that runs on top of Hive
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Edge-based input from HCatalog

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8034/
-----------------------------------------------------------

(Updated Nov. 19, 2012, 10:28 p.m.)


Review request for giraph.


Changes
-------

Nitay's comments.


Description (updated)
-------

Edge-based input from HCatalog


This addresses bug GIRAPH-405.
    https://issues.apache.org/jira/browse/GIRAPH-405


Diffs (updated)
-----

  /trunk/giraph-formats-contrib/pom.xml 1410684 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/GiraphHCatInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogEdgeInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java 1410684 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexOutputFormat.java 1410684 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexValueInputFormat.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 1410684 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/HCatUtils.java PRE-CREATION 
  /trunk/giraph-formats-contrib/src/main/java/org/apache/hcatalog/mapreduce/package-info.java PRE-CREATION 
  /trunk/pom.xml 1410684 

Diff: https://reviews.apache.org/r/8034/diff/


Testing
-------

- mvn verify
- tested on a real application that runs on top of Hive


Thanks,

Alessandro Presta