You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:23:39 UTC
[jira] [Updated] (SPARK-13516) Dataframe inconsistency after
aggregation+union+projection.
[ https://issues.apache.org/jira/browse/SPARK-13516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-13516:
---------------------------------
Labels: bulk-closed (was: )
> Dataframe inconsistency after aggregation+union+projection.
> -----------------------------------------------------------
>
> Key: SPARK-13516
> URL: https://issues.apache.org/jira/browse/SPARK-13516
> Project: Spark
> Issue Type: Bug
> Components: Java API, SQL
> Affects Versions: 1.6.0
> Environment: Local mode, java version 1.8.0_45
> Reporter: Jiri Syrovy
> Priority: Major
> Labels: bulk-closed
>
> Seems that subsequent Aggregation + Adding static column + Union + Projection causes DataFrame inconsistency.
> The problem appears in the following case:
> - Let's have DataFrame called df. Then the problem appears after the following sequence of steps:
> # Aggregation of multiple columns on the Dataframe df and store result as result_agg_1
> # Do another aggregation of multiple columns, but on one less grouping columns and store the result as result_agg_2
> # Align the result of second aggregation by adding missing grouping column with value empty lit("")
> # Union result_agg_1 and result_agg_2
> # Do the projection from "sum(count_column)" to "count_column" for all aggregated columns.
> The result is inconsistent DataFrame that has all data coming from result_agg_1 shifted.
> An example of stripped down code and example result can be seen here:
> https://gist.github.com/xjrk58/e0c7171287ee9bdc8df8
> https://gist.github.com/xjrk58/7a297a42ebb94f300d96
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org