You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "kondziolka9ld (Jira)" <ji...@apache.org> on 2021/03/18 19:49:00 UTC
[jira] [Created] (SPARK-34792) Restore previous behaviour of
randomSplit from spark-2.4.7 in spark-3
kondziolka9ld created SPARK-34792:
-------------------------------------
Summary: Restore previous behaviour of randomSplit from spark-2.4.7 in spark-3
Key: SPARK-34792
URL: https://issues.apache.org/jira/browse/SPARK-34792
Project: Spark
Issue Type: Question
Components: Spark Core, SQL
Affects Versions: 3.0.1
Reporter: kondziolka9ld
Hi,
Please consider a following difference even despite of the same seed.
Is it possible to restore the same
{code:java}
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.7
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val Array(f, s) = Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 0.7), 42)
f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
scala> f.show
+-----+
|value|
+-----+
| 4|
+-----+
scala> s.show
+-----+
|value|
+-----+
| 1|
| 2|
| 3|
| 5|
| 6|
| 7|
| 8|
| 9|
| 10|
+-----+
{code}
while as on spark-3
{code:java}
scala> val Array(f, s) = Seq(1,2,3,4,5,6,7,8,9,10).toDF.randomSplit(Array(0.3, 0.7), 42)
f: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
s: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: int]
scala> f.show
+-----+
|value|
+-----+
| 5|
| 10|
+-----+
scala> s.show
+-----+
|value|
+-----+
| 1|
| 2|
| 3|
| 4|
| 6|
| 7|
| 8|
| 9|
+-----+
{code}
I guess that implementation of `sample` method changed.
Is it possible to restore previous behaviour?
Thanks in advance!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org