You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by frakass <ca...@free.fr> on 2022/01/14 06:17:13 UTC

about memory size for loading file

Hello list

Given the case I have a file whose size is 10GB. The ram of total 
cluster is 24GB, three nodes. So the local node has only 8GB.
If I load this file into Spark as a RDD via sc.textFile interface, will 
this operation run into "out of memory" issue?

Thank you.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: about memory size for loading file

Posted by Chester Gan <c....@gmail.com>.

The default HashPartitioner should distribute data evenly across the
partitions. See
https://medium.com/datalex/on-spark-performance-and-partitioning-strategies-72992bbbf150
.

[image: image.png]

On Fri, Jan 14, 2022 at 2:51 PM frakass <ca...@free.fr> wrote:

> for this case i have 3 partitions, each process 3.333 GB data, am i right?
>
>
> On 2022/1/14 2:20, Sonal Goyal wrote:
> > No it should not. The file would be partitioned and read across each
> node.
> >
> > On Fri, 14 Jan 2022 at 11:48 AM, frakass <capitnfrakass@free.fr
> > <ma...@free.fr>> wrote:
> >
> >     Hello list
> >
> >     Given the case I have a file whose size is 10GB. The ram of total
> >     cluster is 24GB, three nodes. So the local node has only 8GB.
> >     If I load this file into Spark as a RDD via sc.textFile interface,
> will
> >     this operation run into "out of memory" issue?
> >
> >     Thank you.
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >     <ma...@spark.apache.org>
> >
> > --
> > Cheers,
> > Sonal
> > https://github.com/zinggAI/zingg <https://github.com/zinggAI/zingg>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: about memory size for loading file

Posted by frakass <ca...@free.fr>.

for this case i have 3 partitions, each process 3.333 GB data, am i right?


On 2022/1/14 2:20, Sonal Goyal wrote:
> No it should not. The file would be partitioned and read across each node.
> 
> On Fri, 14 Jan 2022 at 11:48 AM, frakass <capitnfrakass@free.fr 
> <ma...@free.fr>> wrote:
> 
>     Hello list
> 
>     Given the case I have a file whose size is 10GB. The ram of total
>     cluster is 24GB, three nodes. So the local node has only 8GB.
>     If I load this file into Spark as a RDD via sc.textFile interface, will
>     this operation run into "out of memory" issue?
> 
>     Thank you.
> 
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
> 
> -- 
> Cheers,
> Sonal
> https://github.com/zinggAI/zingg <https://github.com/zinggAI/zingg>
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: about memory size for loading file

Posted by Sonal Goyal <so...@gmail.com>.

No it should not. The file would be partitioned and read across each node.

On Fri, 14 Jan 2022 at 11:48 AM, frakass <ca...@free.fr> wrote:

> Hello list
>
> Given the case I have a file whose size is 10GB. The ram of total
> cluster is 24GB, three nodes. So the local node has only 8GB.
> If I load this file into Spark as a RDD via sc.textFile interface, will
> this operation run into "out of memory" issue?
>
> Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> --
Cheers,
Sonal
https://github.com/zinggAI/zingg