You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Martin Skauen (JIRA)" <ji...@apache.org> on 2019/03/27 11:11:00 UTC
[jira] [Created] (SPARK-27293) I am interested in finding out if
there is a bug in the implementation of RandomForests. The Issue is when
applying a seed and getting different results than other people from my
class when applying it to the same data.
Martin Skauen created SPARK-27293:
-------------------------------------
Summary: I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other people from my class when applying it to the same data.
Key: SPARK-27293
URL: https://issues.apache.org/jira/browse/SPARK-27293
Project: Spark
Issue Type: Question
Components: PySpark
Affects Versions: 2.4.0
Reporter: Martin Skauen
I am calculating the RMSE metric like this:
{code:java}
(trainingData, testData) = data.randomSplit([0.7, 0.3], 313)
from pyspark.ml.regression import RandomForestRegressor
rfr = RandomForestRegressor(labelCol="labels", featuresCol="features", maxDepth=5, numTrees=3, seed = 313)
from pyspark.ml.evaluation import RegressionEvaluator
evaluator = RegressionEvaluator\
(labelCol="labels", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(predictions)
print("RMSE = %g " % rmse)
{code}
I am setting the seed. For seed = 50 and also for other seeds I get exact same RMSE as people from class. I set seed to 313 and it is giving me different value. What could be the issue here?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org