You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Ranjithkumar Gampa <gr...@gmail.com> on 2012/10/02 02:56:09 UTC

Re: context.write() Vs FSDataOutputStream.writeBytes()

Hello all,

Anybody looked into below topic. Please reply your views.

Thanks
Ranjith

On Fri, Sep 28, 2012 at 1:57 PM, Ranjithkumar Gampa <gr...@gmail.com>wrote:

> Hi,
>
> we are using FSDataOutputStream.writeBytes() from map/reduce to write to
> Hive table path directly instead of context.write() which is working fine
> and so far no problems with this approach.
>  we make sure the file names are distinct by appending taskAttemptId to
> them and we use speculative execution 'false' to ensure map/reducer won't
> work on same data and create inconsistency in writing data to HDFS, we went
> for this approach for below reasons, please let's know if any disadvantages
> with it.
>
> 1) To avoid cleanup of _SUCCESS and _LOG files created by reducer/mapper
> output which Hive may not like.
> 2) To write some records from mappers which doesn't need to participate in
> Reducer logic, so can save some sort and shuffle process. We are exploring
> on Multi Output format, but still above point need to be taken care I think.
> 3) We have some special characters in data, on which we are doing String
> manipulation using 'ISO-8859-1' encoding, using Text class in
> context.write() is not preserving these characters due to default utf-8
> encoding used by it.
>
> Kindly please share if my understanding is not correct and there are some
> other ways of taking care above three points, I am happy to hear and learn,
> our project uses mix of Hadoop MR and Hive.
>
> Thanks in advance.
>
> Regards,
> Ranjith
>
>