You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Gopal V <go...@apache.org> on 2014/06/27 06:44:00 UTC
IFile discussions on JIRA
Hi Tsuyoshi,
Your suggestion from TEZ-945 has resulted in some work by Rajesh to test
out some theories about storage of intermediate data. We have our
findings posted as a prototype & benchmark in
https://issues.apache.org/jira/browse/TEZ-1228
These are about changes that can be made to the file format without any
user-facing API changes.
So far, two pending comments exist to be answered for this format.
Rajesh & I have put a bit of thought into this format drawn from our
observations of the intermediate data generated between vertices in
TPC-H and TPC-DS queries.
I have taken my best knowledge of Pig's requirements while building this.
There are follow up tasks to this format shift improving how the
OnFileSortedOutput will collect data from the pre-sort layers, to add
support for vectorized data.
To pipeline all those into one cycle of API additions, I want as much
input on the format as possible from anyone intending to use the sorted
output phase of Tez.
Cheers,
Gopal
Re: IFile discussions on JIRA
Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Hi Gopal,
Thank you for sharing the information! I glanced over the design doc,
and I'm very interested in the work! I'll read the design document
more intensively and write the feedback on JIRA in a few day.
Thanks,
- Tsuyoshi
On Fri, Jun 27, 2014 at 1:44 PM, Gopal V <go...@apache.org> wrote:
> Hi Tsuyoshi,
>
> Your suggestion from TEZ-945 has resulted in some work by Rajesh to test out
> some theories about storage of intermediate data. We have our findings
> posted as a prototype & benchmark in
>
> https://issues.apache.org/jira/browse/TEZ-1228
>
> These are about changes that can be made to the file format without any
> user-facing API changes.
>
> So far, two pending comments exist to be answered for this format.
>
> Rajesh & I have put a bit of thought into this format drawn from our
> observations of the intermediate data generated between vertices in TPC-H
> and TPC-DS queries.
>
> I have taken my best knowledge of Pig's requirements while building this.
>
> There are follow up tasks to this format shift improving how the
> OnFileSortedOutput will collect data from the pre-sort layers, to add
> support for vectorized data.
>
> To pipeline all those into one cycle of API additions, I want as much input
> on the format as possible from anyone intending to use the sorted output
> phase of Tez.
>
> Cheers,
> Gopal
--
- Tsuyoshi