You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Martin Skauen (JIRA)" <ji...@apache.org> on 2019/03/27 11:11:00 UTC

[jira] [Created] (SPARK-27293) I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other people from my class when applying it to the same data.

Martin Skauen created SPARK-27293:
-------------------------------------

             Summary: I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other people from my class when applying it to the same data. 
                 Key: SPARK-27293
                 URL: https://issues.apache.org/jira/browse/SPARK-27293
             Project: Spark
          Issue Type: Question
          Components: PySpark
    Affects Versions: 2.4.0
            Reporter: Martin Skauen


I am calculating the RMSE metric like this:
{code:java}
(trainingData, testData) = data.randomSplit([0.7, 0.3], 313)
from pyspark.ml.regression import RandomForestRegressor
rfr = RandomForestRegressor(labelCol="labels", featuresCol="features", maxDepth=5, numTrees=3, seed = 313)
from pyspark.ml.evaluation import RegressionEvaluator
evaluator = RegressionEvaluator\
(labelCol="labels", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(predictions)
print("RMSE = %g " % rmse)
{code}
I am setting the seed. For seed = 50 and also for other seeds I get exact same RMSE as people from class. I set seed to 313 and it is giving me different value. What could be the issue here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org