You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Sergio Pena <se...@cloudera.com> on 2015/06/27 01:01:27 UTC

Question regarding the use of TaskAttemptContext on ParquetOutputFormat

Hi,

I see ParquetRecordWriterWrapper constructor is getting/initializing
a TaskAttemptID object that will be passed to the
getRecordWriter(TaskAttemptContext taskAttemptContext, Path file) method of
ParquetOutputFormat. But this method only gets the Configuration and
CompressionCodeName objects to pass to another constructor.

My question is, if TaskAttempID links the Configuration object from the
JobConf parameter of ParquetRecordWriterWrapper, and the Code name can be
retrieved from the JobConf or Properties objects, is there another reason
about using TaskAttempID?

During some java profile tests, I noticed
that ContextUtil.newTaskAttemptContext() takes some time to initialize, and
we can save that time if we use the other constructor.

- Sergio

Re: Question regarding the use of TaskAttemptContext on ParquetOutputFormat

Posted by Ryan Blue <bl...@cloudera.com>.
I thought the wrapper was translating from the mapred API used by Hive 
to the mapreduce API that Parquet implements. If there is a better way 
to do this that is less expensive, I think that would be a good change.

rb

On 06/26/2015 04:01 PM, Sergio Pena wrote:
> Hi,
>
> I see ParquetRecordWriterWrapper constructor is getting/initializing
> a TaskAttemptID object that will be passed to the
> getRecordWriter(TaskAttemptContext taskAttemptContext, Path file) method of
> ParquetOutputFormat. But this method only gets the Configuration and
> CompressionCodeName objects to pass to another constructor.
>
> My question is, if TaskAttempID links the Configuration object from the
> JobConf parameter of ParquetRecordWriterWrapper, and the Code name can be
> retrieved from the JobConf or Properties objects, is there another reason
> about using TaskAttempID?
>
> During some java profile tests, I noticed
> that ContextUtil.newTaskAttemptContext() takes some time to initialize, and
> we can save that time if we use the other constructor.
>
> - Sergio
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.