You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aakash Basu <aa...@gmail.com> on 2018/09/06 10:05:53 UTC

XGBoost Not distributing on cluster having more than 1 worker

Hi,

We're trying to use the XGBoost package from DMLC, it runs successfully on
a standalone machine, but it gets stuck whenever there is 2 or more worker.

PFA:
Code Filename: test.py
Data: trainvorg.csv

Spark Submit command: *spark-submit --master spark://192.168.80.10:7077
<http://192.168.80.10:7077> --jars "$SPARK_HOME/jars/*.jar" --num-executors
2 --executor-cores 5 --executor-memory 10G --driver-cores 5 --driver-memory
25G --conf spark.sql.shuffle.partitions=100 --conf
spark.driver.maxResultSize=2G --conf
"spark.executor.extraJavaOptions=-XX:+UseG1GC" --conf
spark.default.parallelism=8  --conf
spark.scheduler.listenerbus.eventqueue.capacity=20000 /appdata/test.py*

Issue being faced:

[image: Screen Shot 2018-09-04 at 5.34.31 PM.png]
Any help?

Thanks,
Aakash.

Re: XGBoost Not distributing on cluster having more than 1 worker

Posted by Aakash Basu <aa...@gmail.com>.
Hi all,

This is the error which is the reason behind these retries and failures.
Can anyone help understanding as to why it happens and the probably fix for
this?

[image: Screen Shot 2018-09-06 at 4.40.31 PM.png]

Thanks,
Aakash.

On Thu, Sep 6, 2018 at 3:35 PM Aakash Basu <aa...@gmail.com>
wrote:

> Hi,
>
> We're trying to use the XGBoost package from DMLC, it runs successfully on
> a standalone machine, but it gets stuck whenever there is 2 or more worker.
>
> PFA:
> Code Filename: test.py
> Data: trainvorg.csv
>
> Spark Submit command: *spark-submit --master spark://192.168.80.10:7077
> <http://192.168.80.10:7077> --jars "$SPARK_HOME/jars/*.jar" --num-executors
> 2 --executor-cores 5 --executor-memory 10G --driver-cores 5 --driver-memory
> 25G --conf spark.sql.shuffle.partitions=100 --conf
> spark.driver.maxResultSize=2G --conf
> "spark.executor.extraJavaOptions=-XX:+UseG1GC" --conf
> spark.default.parallelism=8  --conf
> spark.scheduler.listenerbus.eventqueue.capacity=20000 /appdata/test.py*
>
> Issue being faced:
>
> [image: Screen Shot 2018-09-04 at 5.34.31 PM.png]
> Any help?
>
> Thanks,
> Aakash.
>