You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by frakass <ca...@free.fr> on 2022/01/14 06:17:13 UTC
about memory size for loading file
Hello list
Given the case I have a file whose size is 10GB. The ram of total
cluster is 24GB, three nodes. So the local node has only 8GB.
If I load this file into Spark as a RDD via sc.textFile interface, will
this operation run into "out of memory" issue?
Thank you.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: about memory size for loading file
Posted by Chester Gan <c....@gmail.com>.
The default HashPartitioner should distribute data evenly across the
partitions. See
https://medium.com/datalex/on-spark-performance-and-partitioning-strategies-72992bbbf150
.
[image: image.png]
On Fri, Jan 14, 2022 at 2:51 PM frakass <ca...@free.fr> wrote:
> for this case i have 3 partitions, each process 3.333 GB data, am i right?
>
>
> On 2022/1/14 2:20, Sonal Goyal wrote:
> > No it should not. The file would be partitioned and read across each
> node.
> >
> > On Fri, 14 Jan 2022 at 11:48 AM, frakass <capitnfrakass@free.fr
> > <ma...@free.fr>> wrote:
> >
> > Hello list
> >
> > Given the case I have a file whose size is 10GB. The ram of total
> > cluster is 24GB, three nodes. So the local node has only 8GB.
> > If I load this file into Spark as a RDD via sc.textFile interface,
> will
> > this operation run into "out of memory" issue?
> >
> > Thank you.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> > <ma...@spark.apache.org>
> >
> > --
> > Cheers,
> > Sonal
> > https://github.com/zinggAI/zingg <https://github.com/zinggAI/zingg>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
Re: about memory size for loading file
Posted by frakass <ca...@free.fr>.
for this case i have 3 partitions, each process 3.333 GB data, am i right?
On 2022/1/14 2:20, Sonal Goyal wrote:
> No it should not. The file would be partitioned and read across each node.
>
> On Fri, 14 Jan 2022 at 11:48 AM, frakass <capitnfrakass@free.fr
> <ma...@free.fr>> wrote:
>
> Hello list
>
> Given the case I have a file whose size is 10GB. The ram of total
> cluster is 24GB, three nodes. So the local node has only 8GB.
> If I load this file into Spark as a RDD via sc.textFile interface, will
> this operation run into "out of memory" issue?
>
> Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> <ma...@spark.apache.org>
>
> --
> Cheers,
> Sonal
> https://github.com/zinggAI/zingg <https://github.com/zinggAI/zingg>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: about memory size for loading file
Posted by Sonal Goyal <so...@gmail.com>.
No it should not. The file would be partitioned and read across each node.
On Fri, 14 Jan 2022 at 11:48 AM, frakass <ca...@free.fr> wrote:
> Hello list
>
> Given the case I have a file whose size is 10GB. The ram of total
> cluster is 24GB, three nodes. So the local node has only 8GB.
> If I load this file into Spark as a RDD via sc.textFile interface, will
> this operation run into "out of memory" issue?
>
> Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> --
Cheers,
Sonal
https://github.com/zinggAI/zingg