You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Roman Shaposhnik <ro...@shaposhnik.org> on 2013/11/19 09:02:58 UTC

hadoop1 vs. hadoop2

Hi!

I posted a proposed patch for GIRAPH-794 but
I also wanted to follow up with a couple of things,
since I can't quite make up my mind about them.

1. hadoop_1 profile

It basically works out-of-the box and is made the
default one in the patch. It will be the one from
which the default Maven artifacts for Giraph
1.1.0 are built and deployed to the Maven repo.
The only issue with it is that it still needs munging.

Basically, hadoop1 profile works out of the box
and in the patch is made to be the default profile.
The only caveat is munging:
   HADOOP_NON_JOBCONTEXT_IS_INTERFACE
   HADOOP_1_SECURITY
   HADOOP_1_SECRET_MANAGER
At this point I'd really like to keep both of the profiles
that we are going to push to Maven repo munge free.
One idea here is to simply have a shim layer for
giraph-core with two submodules:
    giraph-core-hadoop1
    giraph-core-hadoop2
and have an honest shim layer that
would let us paper over the differences
between the two.

This will has a great benefit of keeping our
two main profiles completely munge-free,
which is a pretty big deal.

2. hadoop2 profile at this point builds and
can be deployed, but it has two very
fundamental  limitations when it comes
to Giraph I/O formats. We can't run tests
on those and frankly we can't even submit
jobs against hadoop2 clusters.

The problem here is the fact that all
Hadoop ecosystem projects we depend
on were build against hadoop1.

The only source of these jars that were
built against hadoop2 are the ones coming
from commercial Hadoop distros (such
as CDH4/5) so perhaps we can use those
at least for testing.


Thanks,
Roman.