You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Nitay Joffe <ni...@apache.org> on 2013/02/15 19:59:47 UTC

Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 6:59 p.m.)


Review request for giraph.


Description
-------

For now this is only the Input side of things. One particular thing I added was the concept of "profiles", allowing for easily reading from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveVertexCreator interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs (updated)
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing
-------

Ran on some production jobs and verified results were exactly the same.

In terms of performance this is on par with our current HCatalog stuff. I ran a few jobs and noticed at most a few seconds of difference between the input supersteps. Sometimes it was less, so I think the difference is mostly noise.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/#review16868
-----------------------------------------------------------

Ship it!


Forgot to say, I'm +1 on this :-)

- Maja Kabiljo


On Feb. 21, 2013, 6:17 p.m., Nitay Joffe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8611/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2013, 6:17 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.
> 
> Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.
> 
> Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.
> 
> 
> This addresses bug GIRAPH-453.
>     https://issues.apache.org/jira/browse/GIRAPH-453
> 
> 
> Diffs
> -----
> 
>   giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java ddeaeb769b548eb1002ccf8c18ffe048eb096f8d 
>   giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
>   giraph-hcatalog/pom.xml 019f02083012704a997ffe715cefe3adeb153dd9 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 313bab04c50ed6be7143254de80e36a4ba291516 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
>   giraph-hive/pom.xml PRE-CREATION 
>   giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
>   pom.xml c075762cddd7a698c92aaad4017cd74915160e41 
> 
> Diff: https://reviews.apache.org/r/8611/diff/
> 
> 
> Testing
> -------
> 
> Ran on some production jobs and verified results were exactly the same.
> 
> Here's a comparison of performance on real work loads ("base" is hcatalog, "mine" is hive):
> https://gist.github.com/nitay/880d8fb20d2ac86015d4/raw/6b297fcb287bf8d3dc8175bad217aa86544b4f18/high+school
> 
> Basically we see slight improvement which is expected because I haven't done a lot in terms of performance yet.
> There are few performance improvement ideas coming, this is just the first working version.
> 
> 
> Thanks,
> 
> Nitay Joffe
> 
>


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.

> On Feb. 21, 2013, 6:45 p.m., Maja Kabiljo wrote:
> > This is a lot of great work, Nitay, thanks! I really like that user doesn't have to extend the whole Input/Output format anymore, that was a lot of code duplication every time.
> > 
> > Is it possible to provide some examples/tests for this?

Opened https://issues.apache.org/jira/browse/GIRAPH-534 so that we create examples / tests.


> On Feb. 21, 2013, 6:45 p.m., Maja Kabiljo wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java, lines 29-31
> > <https://reviews.apache.org/r/8611/diff/6/?file=260732#file260732line29>
> >
> >     What is this for? (on some other places too)

It is to allow multiple tables at same time. Basically to do it you need to have some namespacing for Configuration keys, so these profiles are my way of doing it. I have a cleaner solution in mind that I will put in another diff which should clean up some of these.


> On Feb. 21, 2013, 6:45 p.m., Maja Kabiljo wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java, lines 35-36
> > <https://reviews.apache.org/r/8611/diff/6/?file=260732#file260732line35>
> >
> >     Out of curiosity - why do we do this? (why isn't it private)

Sometimes I want to allow inheritance but in this case no need, private it is.


> On Feb. 21, 2013, 6:45 p.m., Maja Kabiljo wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java, line 154
> > <https://reviews.apache.org/r/8611/diff/6/?file=260735#file260735line154>
> >
> >     Could we have an option to reuse edge objects here?

Good call


- Nitay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/#review16867
-----------------------------------------------------------


On Feb. 21, 2013, 6:17 p.m., Nitay Joffe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8611/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2013, 6:17 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.
> 
> Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.
> 
> Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.
> 
> 
> This addresses bug GIRAPH-453.
>     https://issues.apache.org/jira/browse/GIRAPH-453
> 
> 
> Diffs
> -----
> 
>   giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java ddeaeb769b548eb1002ccf8c18ffe048eb096f8d 
>   giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
>   giraph-hcatalog/pom.xml 019f02083012704a997ffe715cefe3adeb153dd9 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 313bab04c50ed6be7143254de80e36a4ba291516 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
>   giraph-hive/pom.xml PRE-CREATION 
>   giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
>   pom.xml c075762cddd7a698c92aaad4017cd74915160e41 
> 
> Diff: https://reviews.apache.org/r/8611/diff/
> 
> 
> Testing
> -------
> 
> Ran on some production jobs and verified results were exactly the same.
> 
> Here's a comparison of performance on real work loads ("base" is hcatalog, "mine" is hive):
> https://gist.github.com/nitay/880d8fb20d2ac86015d4/raw/6b297fcb287bf8d3dc8175bad217aa86544b4f18/high+school
> 
> Basically we see slight improvement which is expected because I haven't done a lot in terms of performance yet.
> There are few performance improvement ideas coming, this is just the first working version.
> 
> 
> Thanks,
> 
> Nitay Joffe
> 
>


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/#review16867
-----------------------------------------------------------


This is a lot of great work, Nitay, thanks! I really like that user doesn't have to extend the whole Input/Output format anymore, that was a lot of code duplication every time.

Is it possible to provide some examples/tests for this?


giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java
<https://reviews.apache.org/r/8611/#comment35797>

    What is this for? (on some other places too)



giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java
<https://reviews.apache.org/r/8611/#comment35794>

    Out of curiosity - why do we do this? (why isn't it private)



giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java
<https://reviews.apache.org/r/8611/#comment35800>

    Could we have an option to reuse edge objects here?


- Maja Kabiljo


On Feb. 21, 2013, 6:17 p.m., Nitay Joffe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8611/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2013, 6:17 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.
> 
> Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.
> 
> Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.
> 
> 
> This addresses bug GIRAPH-453.
>     https://issues.apache.org/jira/browse/GIRAPH-453
> 
> 
> Diffs
> -----
> 
>   giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java ddeaeb769b548eb1002ccf8c18ffe048eb096f8d 
>   giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
>   giraph-hcatalog/pom.xml 019f02083012704a997ffe715cefe3adeb153dd9 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 313bab04c50ed6be7143254de80e36a4ba291516 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
>   giraph-hive/pom.xml PRE-CREATION 
>   giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
>   pom.xml c075762cddd7a698c92aaad4017cd74915160e41 
> 
> Diff: https://reviews.apache.org/r/8611/diff/
> 
> 
> Testing
> -------
> 
> Ran on some production jobs and verified results were exactly the same.
> 
> Here's a comparison of performance on real work loads ("base" is hcatalog, "mine" is hive):
> https://gist.github.com/nitay/880d8fb20d2ac86015d4/raw/6b297fcb287bf8d3dc8175bad217aa86544b4f18/high+school
> 
> Basically we see slight improvement which is expected because I haven't done a lot in terms of performance yet.
> There are few performance improvement ideas coming, this is just the first working version.
> 
> 
> Thanks,
> 
> Nitay Joffe
> 
>


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 21, 2013, 6:17 p.m.)


Review request for giraph.


Changes
-------

Use Hive I/O Library from outside


Description
-------

One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs (updated)
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java ddeaeb769b548eb1002ccf8c18ffe048eb096f8d 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 019f02083012704a997ffe715cefe3adeb153dd9 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 313bab04c50ed6be7143254de80e36a4ba291516 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  pom.xml c075762cddd7a698c92aaad4017cd74915160e41 

Diff: https://reviews.apache.org/r/8611/diff/


Testing
-------

Ran on some production jobs and verified results were exactly the same.

Here's a comparison of performance on real work loads ("base" is hcatalog, "mine" is hive):
https://gist.github.com/nitay/880d8fb20d2ac86015d4/raw/6b297fcb287bf8d3dc8175bad217aa86544b4f18/high+school

Basically we see slight improvement which is expected because I haven't done a lot in terms of performance yet.
There are few performance improvement ideas coming, this is just the first working version.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 20, 2013, 7:59 p.m.)


Review request for giraph.


Changes
-------

javadocs. passes mvn install now.


Description
-------

One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs (updated)
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java ddeaeb769b548eb1002ccf8c18ffe048eb096f8d 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 019f02083012704a997ffe715cefe3adeb153dd9 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java 313bab04c50ed6be7143254de80e36a4ba291516 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml c075762cddd7a698c92aaad4017cd74915160e41 

Diff: https://reviews.apache.org/r/8611/diff/


Testing
-------

Ran on some production jobs and verified results were exactly the same.

Here's a comparison of performance on real work loads ("base" is hcatalog, "mine" is hive):
https://gist.github.com/nitay/880d8fb20d2ac86015d4/raw/6b297fcb287bf8d3dc8175bad217aa86544b4f18/high+school

Basically we see slight improvement which is expected because I haven't done a lot in terms of performance yet.
There are few performance improvement ideas coming, this is just the first working version.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 19, 2013, 11:40 p.m.)


Review request for giraph.


Changes
-------

rebased


Description
-------

One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs (updated)
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java ddeaeb769b548eb1002ccf8c18ffe048eb096f8d 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 019f02083012704a997ffe715cefe3adeb153dd9 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml c075762cddd7a698c92aaad4017cd74915160e41 

Diff: https://reviews.apache.org/r/8611/diff/


Testing
-------

Ran on some production jobs and verified results were exactly the same.

Here's a comparison of performance on real work loads ("base" is hcatalog, "mine" is hive):
https://gist.github.com/nitay/880d8fb20d2ac86015d4/raw/6b297fcb287bf8d3dc8175bad217aa86544b4f18/high+school

Basically we see slight improvement which is expected because I haven't done a lot in terms of performance yet.
There are few performance improvement ideas coming, this is just the first working version.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 9:45 p.m.)


Review request for giraph.


Description
-------

One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing (updated)
-------

Ran on some production jobs and verified results were exactly the same.

Here's a comparison of performance on real work loads ("base" is hcatalog, "mine" is hive):
https://gist.github.com/nitay/880d8fb20d2ac86015d4/raw/6b297fcb287bf8d3dc8175bad217aa86544b4f18/high+school

Basically we see slight improvement which is expected because I haven't done a lot in terms of performance yet.
There are few performance improvement ideas coming, this is just the first working version.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 7:15 p.m.)


Review request for giraph.


Description
-------

One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing (updated)
-------

Ran on some production jobs and verified results were exactly the same.

Here's some comparisons of performance on real work loads ("base" is hcatalog, "mine" is hive):
https://gist.github.com/nitay/b34c8397b7aa1821f858/raw/b5a960891ed0e45e4f7423758471231fc88d7614/current_city
https://gist.github.com/nitay/5bc7f9da50c9b4b4dba2/raw/0dd899e78fbb04ef8c990073fbc1c862db8d5b5b/college
https://gist.github.com/nitay/569cc1a37694de458a74/raw/ca8df93a804f9236b20d251a0dcd6cc97e205008/high_school

We see thatĀ even before significant performance improvements, this already speeds up input time. Some of the jobs allocate memory so quickly that it causes full GC which kills performance, but I expect that has more to do with tuning GC better to match the faster loading. There is an increase in physical memory which I will investigate.

Also there are few performance improvement ideas coming, this is just the first working version.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 7:05 p.m.)


Review request for giraph.


Description
-------

One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing (updated)
-------

Ran on some production jobs and verified results were exactly the same.

Here's some comparisons of performance ("base" is hcatalog, "mine" is hive):
https://gist.github.com/nitay/b34c8397b7aa1821f858/raw/b5a960891ed0e45e4f7423758471231fc88d7614/current_city

Also there are more performance improvements coming, this is just the first working version.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 7:05 p.m.)


Review request for giraph.


Description
-------

One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing (updated)
-------

Ran on some production jobs and verified results were exactly the same.

Here's some comparisons of performance ("base" is hcatalog, "mine" is hive):
https://gist.github.com/nitay/b34c8397b7aa1821f858

Also there are more performance improvements coming, this is just the first working version.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 7:01 p.m.)


Review request for giraph.


Description (updated)
-------

One particular thing I added was the concept of "profiles", allowing for easily reading / writing from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing
-------

Ran on some production jobs and verified results were exactly the same.

In terms of performance this is on par with our current HCatalog stuff. I ran a few jobs and noticed at most a few seconds of difference between the input supersteps. Sometimes it was less, so I think the difference is mostly noise.


Thanks,

Nitay Joffe


Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)

Posted by Nitay Joffe <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 6:59 p.m.)


Review request for giraph.


Description
-------

For now this is only the Input side of things. One particular thing I added was the concept of "profiles", allowing for easily reading from multiple tables. This should remove a lot of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). Things under this package (and its children) do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an XInputFormat anymore. They just create a class the implements the HiveVertexCreator interface, plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing
-------

Ran on some production jobs and verified results were exactly the same.

In terms of performance this is on par with our current HCatalog stuff. I ran a few jobs and noticed at most a few seconds of difference between the input supersteps. Sometimes it was less, so I think the difference is mostly noise.


Thanks,

Nitay Joffe