You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by johnzeng <jo...@fossil.com> on 2016/07/01 02:03:58 UTC

Looking for help about stackoverflow in spark

I am trying to load a 1 TB collection into spark cluster from mongo. But I am
keep getting stack overflow error  after running for a while.

I have posted a question in stackoverflow.com, and tried all advies they
have provide, nothing works...

how to load large database into spark
<http://stackoverflow.com/questions/38096502/how-to-load-large-table-in-spark>  

I have tried:
1, use persist to make it MemoryAndDisk,  same error after running same
time.
2, add more instance,  same error after running same time.
3, run this script on another collection which is much smaller, everything
is good, so I think my codes are all right.
4, remove the reduce process, same error after running same time.
5, remove the map process,  same error after running same time.
6, change the sql I used, it's faster, but  same error after running shorter
time.
7,retrieve "_id" instead of "u_at" and "c_at",  same error after running
same time.

Anyone knows how many resources do I need to handle this 1TB database? I
only retrieve two fields form it, and this field is only 1% of a
document(because we have an array containing about 90+ embedded documents in
it.)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Looking-for-help-about-stackoverflow-in-spark-tp27255.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Looking for help about stackoverflow in spark

Posted by Chanh Le <gi...@gmail.com>.
Hi John,
I think it relates to drivers memory more than the others thing you said.

Can you just increase more memory for driver?




> On Jul 1, 2016, at 9:03 AM, johnzeng <jo...@fossil.com> wrote:
> 
> I am trying to load a 1 TB collection into spark cluster from mongo. But I am
> keep getting stack overflow error  after running for a while.
> 
> I have posted a question in stackoverflow.com, and tried all advies they
> have provide, nothing works...
> 
> how to load large database into spark
> <http://stackoverflow.com/questions/38096502/how-to-load-large-table-in-spark>  
> 
> I have tried:
> 1, use persist to make it MemoryAndDisk,  same error after running same
> time.
> 2, add more instance,  same error after running same time.
> 3, run this script on another collection which is much smaller, everything
> is good, so I think my codes are all right.
> 4, remove the reduce process, same error after running same time.
> 5, remove the map process,  same error after running same time.
> 6, change the sql I used, it's faster, but  same error after running shorter
> time.
> 7,retrieve "_id" instead of "u_at" and "c_at",  same error after running
> same time.
> 
> Anyone knows how many resources do I need to handle this 1TB database? I
> only retrieve two fields form it, and this field is only 1% of a
> document(because we have an array containing about 90+ embedded documents in
> it.)
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Looking-for-help-about-stackoverflow-in-spark-tp27255.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org