You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Irving Duran <ir...@gmail.com> on 2018/05/02 14:15:08 UTC

Re: ML Linear and Logistic Regression - Poor Performance

May want to think about reducing the number of iterations.  Right now you
have it set at 500.

Thank You,

Irving Duran


On Fri, Apr 27, 2018 at 7:15 PM Thodoris Zois <zo...@ics.forth.gr> wrote:

> I am in CentOS 7 and I use Spark 2.3.0. Below I have posted my code.
> Logistic regression took 85 minutes and linear regression 127 seconds…
>
> My dataset as I said is 128 MB and contains: 1000 features and ~100
> classes.
>
>
> #SparkSession
> ss = SparkSession.builder.getOrCreate()
>
>
> start = time.time()
>
> #Read data
> trainData = ss.read.format("csv").option("inferSchema","true").load(file)
>
> #Calculate Features
> assembler = VectorAssembler(inputCols=trainData.columns[1:], outputCol=
> "features")
> trainData = assembler.transform(trainData)
>
> #Drop columns
> dropColumns = trainData.columns
> dropColumns = [e for e in dropColumns if e not in ('_c0', 'features')]
> trainData = trainData.drop(*dropColumns)
>
> #Rename column from _c0 to label
> trainData = trainData.withColumnRenamed("_c0", "label")
>
> #Logistic regression
> lr = LogisticRegression(maxIter=500, regParam=0.3, elasticNetParam=0.8)
> lrModel = lr.fit(trainData)
>
> #Output Coefficients
> print("Coefficients: " + str(lrModel.coefficientMatrix))
>
>
>
> - Thodoris
>
>
> On 27 Apr 2018, at 22:50, Irving Duran <ir...@gmail.com> wrote:
>
> Are you reformatting the data correctly for logistic (meaning 0 & 1's)
> before modeling?  What are OS and spark version you using?
>
> Thank You,
>
> Irving Duran
>
>
> On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois <zo...@ics.forth.gr> wrote:
>
>> Hello,
>>
>> I am running an experiment to test logistic and linear regression on
>> spark using MLlib.
>>
>> My dataset is only 128MB and something weird happens. Linear regression
>> takes about 127 seconds either with 1 or 500 iterations. On the other hand,
>> logistic regression most of the times does not manage to finish either with
>> 1 iteration. I usually get memory heap error.
>>
>> In both cases I use the default cores and memory for driver and I spawn 1
>> executor with 1 core and 2GBs of memory.
>>
>> Except that, I get a warning about NativeBLAS. I searched in the Internet
>> and I found that I have to install libgfortran. Even if I did it the
>> warning remains.
>>
>> Any ideas for the above?
>>
>> Thank you,
>> - Thodoris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>