You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tobi Bosede <an...@gmail.com> on 2016/07/11 23:12:43 UTC

chisqSelector in Python

Hi all,

There is no python example for chisqSelector in python at the below link.
https://spark.apache.org/docs/1.4.1/mllib-feature-extraction.html#chisqselector

So I am converting the scala code to python. I "translated" the following
code

val discretizedData = data.map { lp =>
  LabeledPoint(lp.label, Vectors.dense(lp.features.toArray.map { x =>
x / 16 } ) )}

as:
*discretizedData = data.map(lambda lp: LabeledPoint(lp.label,
Vectors.dense(np.array(lp.features).map(lambda x: x / 16) ) ))*

However when I call selector.fit(discretizedData) I get this error. Any
thoughts on the problem? Thanks.

Py4JJavaError: An error occurred while calling o2184.fitChiSqSelector.
: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 158.0 failed 4 times, most recent failure: Lost task
0.3 in stage 158.0 (TID 3078, node032.hadoop.cls04):
java.net.SocketException: Connection reset