You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by Chao Wang <ch...@yahoo-inc.com> on 2009/11/12 00:26:56 UTC

FYI - forking TFile off Hadoop into Zebra

Hi all,

 

In Jira Pig-1077, we Zebra team plan to utilize Hadoop TFile's split by
record sequence number support to provide record(row)-based input split
support in Zebra.

 

Here we would like to point out that: along the way we plan to also
resolve the dependency issue that Zebra record-based split needs Hadoop
TFile split support to work. For this dependency, Zebra has to maintain
its own copy of Hadoop jar in svn for it to be able to build.
Furthermore, the fact that Zebra currently sits inside Pig in svn and
Pig itself maintains its own copy of Hadoop jar in lib directory makes
things even messier. Finally, we notice that Zebra is new and making
many changes and needs to get new revisions quickly, while Hadoop and
Pig are more mature and moving slowly and thus can't make new releases
for Zebra all the time. 

After carefully thinking through all this, we plan to fork the TFile
part off the Hadoop and port it into Zebra's own code base. This will
greatly simply the building process of Zebra and also enable it to make
quick revisions. 

Last, we would like to point out that this is a short term solution for
Zebra and we plan to: 
1) port all changes to Zebra TFile back into Hadoop TFile. 
2) in the long run have a single unified solution for this. 

 

 

For more information, please see
https://issues.apache.org/jira/browse/PIG-1077

 

 

 

Welcome your feedback on this.

 

 

 

Regards,

 

Chao

Re: FYI - forking TFile off Hadoop into Zebra

Posted by Alan Gates <ga...@yahoo-inc.com>.

On Nov 11, 2009, at 4:13 PM, Ashutosh Chauhan wrote:

> On Wed, Nov 11, 2009 at 18:26, Chao Wang <ch...@yahoo-inc.com> wrote:
>
>
>> Last, we would like to point out that this is a short term solution  
>> for
>> Zebra and we plan to:
>> 1) port all changes to Zebra TFile back into Hadoop TFile.
>> 2) in the long run have a single unified solution for this.
>>
>> Just for clarity, in long run as Zebra stabilizes and Pig adopts
> hadoop-0.22, Zebra will get rid of this fork?

I think the promise is they'll get rid of the fork at some point, not  
necessarily at 0.22 though.

Alan.

>
> Ashutosh

Re: FYI - forking TFile off Hadoop into Zebra

Posted by Ashutosh Chauhan <as...@gmail.com>.

On Wed, Nov 11, 2009 at 18:26, Chao Wang <ch...@yahoo-inc.com> wrote:


> Last, we would like to point out that this is a short term solution for
> Zebra and we plan to:
> 1) port all changes to Zebra TFile back into Hadoop TFile.
> 2) in the long run have a single unified solution for this.
>
> Just for clarity, in long run as Zebra stabilizes and Pig adopts
hadoop-0.22, Zebra will get rid of this fork?

Ashutosh