You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by rok <ro...@gmail.com> on 2014/11/05 11:38:09 UTC

using LogisticRegressionWithSGD.train in Python crashes with "Broken pipe"

I have a dataset comprised of ~200k labeled points whose features are
SparseVectors with ~20M features. I take 5% of the data for a training set. 

> model = LogisticRegressionWithSGD.train(training_set)

fails with 

ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
  File
"/cluster/home/roskarr/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 472, in send_command
    self.socket.sendall(command.encode('utf-8'))
  File "/cluster/home/roskarr/miniconda/lib/python2.7/socket.py", line 224,
in meth
    return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe

I'm at a loss as to where to begin to debug this... any suggestions? Thanks,

Rok



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: using LogisticRegressionWithSGD.train in Python crashes with "Broken pipe"

Posted by Davies Liu <da...@databricks.com>.
It seems that the JVM failed to start to crash silently.

On Thu, Nov 13, 2014 at 6:06 AM, rok <ro...@gmail.com> wrote:
> Hi, I'm using Spark 1.1.0. There is no error on the executors -- it appears
> as if the job never gets properly dispatched -- the only message is the
> "Broken Pipe" message in the driver.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18846.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: using LogisticRegressionWithSGD.train in Python crashes with "Broken pipe"

Posted by rok <ro...@gmail.com>.
Hi, I'm using Spark 1.1.0. There is no error on the executors -- it appears
as if the job never gets properly dispatched -- the only message is the
"Broken Pipe" message in the driver. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18846.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: using LogisticRegressionWithSGD.train in Python crashes with "Broken pipe"

Posted by Xiangrui Meng <me...@gmail.com>.
Which Spark version did you use? Could you check the WebUI and attach
the error message on executors? -Xiangrui

On Wed, Nov 5, 2014 at 8:23 AM, rok <ro...@gmail.com> wrote:
> yes, the training set is fine, I've verified it.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18195.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: using LogisticRegressionWithSGD.train in Python crashes with "Broken pipe"

Posted by rok <ro...@gmail.com>.
yes, the training set is fine, I've verified it. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18195.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: using LogisticRegressionWithSGD.train in Python crashes with "Broken pipe"

Posted by jamborta <ja...@gmail.com>.
Hi Rok, 

you could try to debug it by first collecting your training_set, see if it
gets you something back, before passing it to the train method. Then go
through each line in the train method, also the serializer and check where
it fails exactly.

thanks,





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18190.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org