You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Bassett, Kenneth" <kb...@textron.com.INVALID> on 2022/03/17 16:01:02 UTC

[Pyspark] [Linear Regression] Can't Fit Data

Hello,

I am having an issue with Linear Regression when trying to fit training data to the model. The code below used to work, but it stopped recently. Spark is version 3.2.1.

# Split Data into train and test data
train, test = data.randomSplit([0.9, 0.1])
y = 'Build_Rate'

# Perform regression with train data
assembler = VectorAssembler(inputCols=feature_cols, outputCol="Features")
vtrain = assembler.transform(train).select('Features', y)
lin_reg = LinearRegression(regParam = 0.0, elasticNetParam = 0.0, solver='normal', featuresCol = 'Features', labelCol = y)
model = lin_reg.fit(vtrain) FAILS HERE

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 388.0 failed 4 times, most recent failure: Lost task 0.3 in stage 388.0 (TID 422) (10.139.64.4 executor 0): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize &#39;MMM dd, yyyy hh:mm:ss aa&#39; pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

The full traceback is attached.

The error is confusing me because there are no datetime columns in "train". "vtrain" is just "train" with the feature columns in dense vector form.
[cid:image002.png@01D839F5.9EEB4860]
Does anyone know how to fix this error?

Thanks,
Ken Bassett
Data Scientist



1451 Marvin Griffin Rd.
Augusta, GA 30906
(m) (706) 469-0696
kbassett@textron.com<ma...@textron.com>

[2019 E-mail Signature]


Re: [Pyspark] [Linear Regression] Can't Fit Data

Posted by Sean Owen <sr...@gmail.com>.
The error points you to the answer. Somewhere in your code you are parsing
dates, and the date format is no longer valid / supported. These changes
are doc'ed in the docs it points you to.
It is not related to the regression itself.

On Thu, Mar 17, 2022 at 11:35 AM Bassett, Kenneth
<kb...@textron.com.invalid> wrote:

> Hello,
>
>
>
> I am having an issue with Linear Regression when trying to fit training
> data to the model. The code below used to work, but it stopped recently.
> Spark is version 3.2.1.
>
>
>
> # Split Data into train and test data
>
> train, test = data.randomSplit([0.9, 0.1])
>
> y = ’Build_Rate’
>
>
>
> # Perform regression with train data
>
> assembler = VectorAssembler(inputCols=feature_cols, outputCol="Features")
>
> vtrain = assembler.transform(train).select('Features', y)
>
> lin_reg = LinearRegression(regParam = 0.0, elasticNetParam = 0.0,
> solver='normal', featuresCol = 'Features', labelCol = y)
>
> model = lin_reg.fit(vtrain) *FAILS HERE*
>
>
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 388.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 388.0 (TID 422) (10.139.64.4 executor 0):
> org.apache.spark.SparkUpgradeException: You may get a different result due
> to the upgrading of Spark 3.0: Fail to recognize &#39;MMM dd, yyyy hh:mm:ss
> aa&#39; pattern in the DateTimeFormatter. 1) You can set
> spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before
> Spark 3.0. 2) You can form a valid datetime pattern with the guide from
> https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>
>
>
> The full traceback is attached.
>
>
>
> The error is confusing me because there are no datetime columns in
> “train”. “vtrain” is just “train” with the feature columns in dense vector
> form.
>
> Does anyone know how to fix this error?
>
>
>
> Thanks,
>
>
> *Ken Bassett **Data Scientist *
>
>
>
>
>
>
>
> 1451 Marvin Griffin Rd.
> Augusta, GA 30906
>
> (m) (706) 469-0696
>
> kbassett@textron.com
>
>
>
> [image: 2019 E-mail Signature]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org