You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by agg212 <al...@brown.edu> on 2019/07/10 14:57:53 UTC

Problems running TPC-H on Raspberry Pi Cluster

We are trying to benchmark TPC-H (scale factor 1) on a 13-node Raspberry Pi
3B+ cluster (1 master, 12 workers). Each node has 1GB of RAM and a quad-core
processor, running Ubuntu Server 18.04. The cluster is using the Spark
standalone scheduler with the *.tbl files from TPCH’s dbgen tool stored in
HDFS.

We are experiencing several failures when trying to run queries. Jobs fail
unpredictably, usually with one or many “DEAD/LOST” nodes displaying in the
web UI. It appears that one or more nodes “hang” during query execution and
become unreachable/timeout.

We have included our configuration parameters as well as the driver program
below. Any recommendations would be greatly appreciated

-------------------------------------------

-------------------------------------------



Driver:
-------------------------------------------




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Problems running TPC-H on Raspberry Pi Cluster

Posted by agg212 <al...@brown.edu>.

Good to know. Will look into the Raspberry Pi 4 (w/4GB RAM). 

In general, are there any tuning or configuration tips/tricks for very
memory-constrained deployments (e.g., 1-4GB RAM)?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Problems running TPC-H on Raspberry Pi Cluster

Posted by Reynold Xin <rx...@databricks.com>.

I don't think Spark is meant to run with 1GB of memory on the entire system. The JVM loads almost 200MB of bytecode, and each page during query processing takes a min of 64MB.

Maybe on the 4GB model of raspberry pi 4.

On Wed, Jul 10, 2019 at 7:57 AM, agg212 < alexander_galakatos@brown.edu > wrote:

> 
> 
> 
> We are trying to benchmark TPC-H (scale factor 1) on a 13-node Raspberry
> Pi 3B+ cluster (1 master, 12 workers). Each node has 1GB of RAM and a
> quad-core processor, running Ubuntu Server 18.04. The cluster is using the
> Spark standalone scheduler with the *.tbl files from TPCH’s dbgen tool
> stored in HDFS.
> 
> 
> 
> We are experiencing several failures when trying to run queries. Jobs fail
> unpredictably, usually with one or many “DEAD/LOST” nodes displaying in
> the web UI. It appears that one or more nodes “hang” during query
> execution and become unreachable/timeout.
> 
> 
> 
> We have included our configuration parameters as well as the driver
> program below. Any recommendations would be greatly appreciated
> 
> 
> 
> -------------------------------------------
> 
> 
> 
> -------------------------------------------
> 
> 
> 
> Driver:
> -------------------------------------------
> 
> 
> 
> --
> Sent from: http:/ / apache-spark-user-list. 1001560. n3. nabble. com/ (
> http://apache-spark-user-list.1001560.n3.nabble.com/ )
> 
> 
> 
> --------------------------------------------------------------------- To
> unsubscribe e-mail: user-unsubscribe@ spark. apache. org (
> user-unsubscribe@spark.apache.org )
> 
> 
>