You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Gopal V <go...@apache.org> on 2014/06/27 06:44:00 UTC

IFile discussions on JIRA

Hi Tsuyoshi,

Your suggestion from TEZ-945 has resulted in some work by Rajesh to test 
out some theories about storage of intermediate data. We have our 
findings posted as a prototype & benchmark in

https://issues.apache.org/jira/browse/TEZ-1228

These are about changes that can be made to the file format without any 
user-facing API changes.

So far, two pending comments exist to be answered for this format.

Rajesh & I have put a bit of thought into this format drawn from our 
observations of the intermediate data generated between vertices in 
TPC-H and TPC-DS queries.

I have taken my best knowledge of Pig's requirements while building this.

There are follow up tasks to this format shift improving how the 
OnFileSortedOutput will collect data from the pre-sort layers, to add 
support for vectorized data.

To pipeline all those into one cycle of API additions, I want as much 
input on the format as possible from anyone intending to use the sorted 
output phase of Tez.

Cheers,
Gopal

Re: IFile discussions on JIRA

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Hi Gopal,

Thank you for sharing the information! I glanced over the design doc,
and I'm very interested in the work! I'll read the design document
more intensively and write the feedback on JIRA in a few day.

Thanks,
- Tsuyoshi

On Fri, Jun 27, 2014 at 1:44 PM, Gopal V <go...@apache.org> wrote:
> Hi Tsuyoshi,
>
> Your suggestion from TEZ-945 has resulted in some work by Rajesh to test out
> some theories about storage of intermediate data. We have our findings
> posted as a prototype & benchmark in
>
> https://issues.apache.org/jira/browse/TEZ-1228
>
> These are about changes that can be made to the file format without any
> user-facing API changes.
>
> So far, two pending comments exist to be answered for this format.
>
> Rajesh & I have put a bit of thought into this format drawn from our
> observations of the intermediate data generated between vertices in TPC-H
> and TPC-DS queries.
>
> I have taken my best knowledge of Pig's requirements while building this.
>
> There are follow up tasks to this format shift improving how the
> OnFileSortedOutput will collect data from the pre-sort layers, to add
> support for vectorized data.
>
> To pipeline all those into one cycle of API additions, I want as much input
> on the format as possible from anyone intending to use the sorted output
> phase of Tez.
>
> Cheers,
> Gopal



-- 
- Tsuyoshi