You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Simone <si...@gmail.com> on 2016/09/26 17:23:46 UTC

Pyspark ML - Unable to finish cross validation

Hello,

I am using pyspark to train a Logistic Regression model using cross validation with ML. My dataset is - for testing purposes very small - like no more than 50 records for train.
On the other hand, my "feature" column has a very large size - i.e., 1500+ columns.

I am running on yarn using 3 executors, with 4gb and 4 cores each. I am using cache to store dataframes.

Unfortunately, my process does not finish and hangs in doing cross validation. 

Any clues? 

Thanks guys

Simone