You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ankur Jain <an...@yash.com> on 2016/06/09 11:51:52 UTC

Saving Parquet files to S3

Hello Team,

I want to write parquet files to AWS S3, but I want to size each file size to 1 GB.
Can someone please guide me on how I can achieve the same?

I am using AWS EMR with spark 1.6.1.

Thanks,
Ankur
Information transmitted by this e-mail is proprietary to YASH Technologies and/ or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at info@yash.com and delete this mail from your records.

Re: Saving Parquet files to S3

Posted by Bijay Kumar Pathak <bk...@mtu.edu>.
Hi Ankur,

I also tried setting a property to write parquet file size of 256MB. I am
using pyspark below is how I set the property but it's not working for me.
How did you set the property?


spark_context._jsc.hadoopConfiguration().setInt( "dfs.blocksize", 268435456)
spark_context._jsc.hadoopConfiguration().setInt( "parquet.block.size",
268435)

Thanks,
Bijay

On Fri, Jun 10, 2016 at 5:24 AM, Ankur Jain <an...@yash.com> wrote:

> Thanks maropu.. It worked…
>
>
>
> *From:* Takeshi Yamamuro [mailto:linguin.m.s@gmail.com]
> *Sent:* 10 June 2016 11:47 AM
> *To:* Ankur Jain
> *Cc:* user@spark.apache.org
> *Subject:* Re: Saving Parquet files to S3
>
>
>
> Hi,
>
>
>
> You'd better off `setting parquet.block.size`.
>
>
>
> // maropu
>
>
>
> On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann <
> daniel.siegmann@teamaol.com> wrote:
>
> I don't believe there's anyway to output files of a specific size. What
> you can do is partition your data into a number of partitions such that the
> amount of data they each contain is around 1 GB.
>
>
>
> On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <an...@yash.com> wrote:
>
> Hello Team,
>
>
>
> I want to write parquet files to AWS S3, but I want to size each file size
> to 1 GB.
>
> Can someone please guide me on how I can achieve the same?
>
>
>
> I am using AWS EMR with spark 1.6.1.
>
>
>
> Thanks,
>
> Ankur
>
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at info@yash.com and delete this
> mail from your records.
>
>
>
>
>
>
>
> --
>
> ---
> Takeshi Yamamuro
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at info@yash.com and delete this
> mail from your records.
>

RE: Saving Parquet files to S3

Posted by Ankur Jain <an...@yash.com>.
Thanks maropu.. It worked…

From: Takeshi Yamamuro [mailto:linguin.m.s@gmail.com]
Sent: 10 June 2016 11:47 AM
To: Ankur Jain
Cc: user@spark.apache.org
Subject: Re: Saving Parquet files to S3

Hi,

You'd better off `setting parquet.block.size`.

// maropu

On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann <da...@teamaol.com>> wrote:
I don't believe there's anyway to output files of a specific size. What you can do is partition your data into a number of partitions such that the amount of data they each contain is around 1 GB.

On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <an...@yash.com>> wrote:
Hello Team,

I want to write parquet files to AWS S3, but I want to size each file size to 1 GB.
Can someone please guide me on how I can achieve the same?

I am using AWS EMR with spark 1.6.1.

Thanks,
Ankur
Information transmitted by this e-mail is proprietary to YASH Technologies and/ or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at info@yash.com<ma...@yash.com> and delete this mail from your records.




--
---
Takeshi Yamamuro
Information transmitted by this e-mail is proprietary to YASH Technologies and/ or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at info@yash.com and delete this mail from your records.

Re: Saving Parquet files to S3

Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,

You'd better off `setting parquet.block.size`.

// maropu

On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann <daniel.siegmann@teamaol.com
> wrote:

> I don't believe there's anyway to output files of a specific size. What
> you can do is partition your data into a number of partitions such that the
> amount of data they each contain is around 1 GB.
>
> On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <an...@yash.com> wrote:
>
>> Hello Team,
>>
>>
>>
>> I want to write parquet files to AWS S3, but I want to size each file
>> size to 1 GB.
>>
>> Can someone please guide me on how I can achieve the same?
>>
>>
>>
>> I am using AWS EMR with spark 1.6.1.
>>
>>
>>
>> Thanks,
>>
>> Ankur
>> Information transmitted by this e-mail is proprietary to YASH
>> Technologies and/ or its Customers and is intended for use only by the
>> individual or entity to which it is addressed, and may contain information
>> that is privileged, confidential or exempt from disclosure under applicable
>> law. If you are not the intended recipient or it appears that this mail has
>> been forwarded to you without proper authority, you are notified that any
>> use or dissemination of this information in any manner is strictly
>> prohibited. In such cases, please notify us immediately at info@yash.com
>> and delete this mail from your records.
>>
>
>


-- 
---
Takeshi Yamamuro

Re: Saving Parquet files to S3

Posted by Daniel Siegmann <da...@teamaol.com>.
I don't believe there's anyway to output files of a specific size. What you
can do is partition your data into a number of partitions such that the
amount of data they each contain is around 1 GB.

On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <an...@yash.com> wrote:

> Hello Team,
>
>
>
> I want to write parquet files to AWS S3, but I want to size each file size
> to 1 GB.
>
> Can someone please guide me on how I can achieve the same?
>
>
>
> I am using AWS EMR with spark 1.6.1.
>
>
>
> Thanks,
>
> Ankur
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at info@yash.com and delete this
> mail from your records.
>