You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by anvesh reddy <an...@gmail.com> on 2016/10/17 12:55:46 UTC

Parquet writing to buffer or byte stream

Hello,

Hope your doing great !

I have a java application which converts the json format messages to
Parquet format. Eventually, the parquet format messages needs to be stored
in S3. As Parquet supports storing messages to S3 only through Access key
and Secret Access key which does not work for me (We use only IAM roles).*
I want to store the parquet messages in a buffer or byte stream and then
move them to S3.*

As far as the examples, I have seen Parquet is writing messages to file. *Does
Parquet support writing to buffer or byte stream ?* (If so, any code
examples are much appreciated and helpful for me)

Reply me at the earliest convenience.

Thanks,
Anvesh

Re: Parquet writing to buffer or byte stream

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Anvesh,

We use Parquet with IAM roles by setting aws.iam.role.arn in our Hadoop
Configuration. We use EmrFileSystem rather than S3A, so your configuration
may be slightly different, but Parquet works just fine with a S3 file
system. You shouldn't need to write to a buffer first because of
authentication. But, you may want to write to the local file system first
to avoid an expensive copy when you commit data.

The normal FileOutputCommitter uses a rename to move files into the final
location to commit them when tasks or jobs complete. That doesn't work with
S3 because rename is a copy, which actually copies all of the bytes in a
file. When we write to S3, we write a file to local disk and then use
multi-part upload to commit it.

rb

On Mon, Oct 17, 2016 at 5:55 AM, anvesh reddy <an...@gmail.com>
wrote:

> Hello,
>
> Hope your doing great !
>
> I have a java application which converts the json format messages to
> Parquet format. Eventually, the parquet format messages needs to be stored
> in S3. As Parquet supports storing messages to S3 only through Access key
> and Secret Access key which does not work for me (We use only IAM roles).*
> I want to store the parquet messages in a buffer or byte stream and then
> move them to S3.*
>
> As far as the examples, I have seen Parquet is writing messages to file.
> *Does
> Parquet support writing to buffer or byte stream ?* (If so, any code
> examples are much appreciated and helpful for me)
>
> Reply me at the earliest convenience.
>
> Thanks,
> Anvesh
>



-- 
Ryan Blue
Software Engineer
Netflix