You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jason Piper (JIRA)" <ji...@apache.org> on 2016/04/05 03:04:25 UTC
[jira] [Created] (SPARK-14393) monotonicallyIncreasingId not
monotonically increasing with downstream coalesce
Jason Piper created SPARK-14393:
-----------------------------------
Summary: monotonicallyIncreasingId not monotonically increasing with downstream coalesce
Key: SPARK-14393
URL: https://issues.apache.org/jira/browse/SPARK-14393
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.6.0
Reporter: Jason Piper
When utilising monotonicallyIncreasingId with a coalesce, it appears that every partition uses the same offset (0) leading to non-monotonically increasing IDs.
See examples below
>>> sqlContext.range(10).select(monotonicallyIncreasingId()).show()
+---------------------------+
|monotonicallyincreasingid()|
+---------------------------+
| 25769803776|
| 51539607552|
| 77309411328|
| 103079215104|
| 128849018880|
| 163208757248|
| 188978561024|
| 214748364800|
| 240518168576|
| 266287972352|
+---------------------------+
>>> sqlContext.range(10).select(monotonicallyIncreasingId()).coalesce(1).show()
+---------------------------+
|monotonicallyincreasingid()|
+---------------------------+
| 0|
| 0|
| 0|
| 0|
| 0|
| 0|
| 0|
| 0|
| 0|
| 0|
+---------------------------+
>>> sqlContext.range(10).repartition(5).select(monotonicallyIncreasingId()).coalesce(1).show()
+---------------------------+
|monotonicallyincreasingid()|
+---------------------------+
| 0|
| 1|
| 0|
| 0|
| 1|
| 2|
| 3|
| 0|
| 1|
| 2|
+---------------------------+
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org