You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by Matthew Hayes <ma...@gmail.com> on 2015/05/23 22:15:36 UTC

Review Request 34636: DATAFU-58 Update to Hadoop 2.7.0 and Pig 0.14.0 and fix all tests

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34636/
-----------------------------------------------------------

Review request for DataFu.


Repository: datafu


Description
-------

This updates DataFu to use Hadoop 2.7.0 and Pig 0.14.0.  I've fixed all issues that I could find.  All the unit tests pass.  Some of the issues were already addressd by Daniel Dai's earlier patch, but I found some more problems.

Summary of issues:

* Hourglass tests failed because we were pulling in avro-tools, which contains Hadoop classes and therefore conflicts.  I removed the dependency.  I'm not sure why we had this.  I also added a handy script to help with this analysis, which uses jarfish.
* Hourglass used .toString() for many Path instances.  This caused a problem because the resulting string is prefixed with "file:" and therefore isn't a valid path you can use in File.  The general fix is to use getName() instead, which gets the simple name and is usually suitable.
* Some Pig unit tests had assumptions on tuple ordering in bags.  I applied a sort to the actual and expected bags to ensure a consistent comparison.


Diffs
-----

  README.md 8e1b67d 
  build-plugin/src/main/java/org/adrianwalker/multilinestring/MultilineProcessor.java 9abdba5 
  datafu-hourglass/.gitignore 942515e 
  datafu-hourglass/build.gradle 75a2876 
  datafu-hourglass/find_dupes.rb PRE-CREATION 
  datafu-hourglass/src/main/java/datafu/hourglass/fs/PathUtils.java c270c7b 
  datafu-hourglass/src/main/java/datafu/hourglass/mapreduce/DistributedCacheHelper.java 62975d1 
  datafu-hourglass/src/test/java/datafu/hourglass/demo/Examples.java 039822c 
  datafu-hourglass/src/test/java/datafu/hourglass/test/PartitionCollapsingExecutionPlannerTests.java d68ea83 
  datafu-hourglass/src/test/java/datafu/hourglass/test/PartitionCollapsingJoinTest.java 02aa342 
  datafu-hourglass/src/test/java/datafu/hourglass/test/PartitionCollapsingTests.java fff1cfd 
  datafu-hourglass/src/test/java/datafu/hourglass/test/PartitionPreservingCollapsingIntegrationTests.java a8f020b 
  datafu-hourglass/src/test/java/datafu/hourglass/test/PartitionPreservingJoinTests.java c41fd39 
  datafu-hourglass/src/test/java/datafu/hourglass/test/PartitionPreservingTests.java acae96c 
  datafu-hourglass/src/test/java/datafu/hourglass/test/TestAvroJob.java b428003 
  datafu-hourglass/src/test/java/datafu/hourglass/test/TestBase.java bc52977 
  datafu-pig/build.gradle ea385d2 
  datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 9bcc384 
  gradle/dependency-versions.gradle 3b0835f 

Diff: https://reviews.apache.org/r/34636/diff/


Testing
-------

./gradlew test


Thanks,

Matthew Hayes