You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jo Desmet (JIRA)" <ji...@apache.org> on 2015/10/01 04:52:04 UTC
[jira] [Updated] (SPARK-10893) Lag Analytic function broken

     [ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jo Desmet updated SPARK-10893:
------------------------------
    Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new HiveContext(sc);
    DataFrame df = sqlContext.read().json(getInputPath("input.json"));
    
    df = df.withColumn(
      "previous",
      lag(dataFrame.col("VBB"), 1)
        .over(Window.orderBy(dataFrame.col("VAA")))
      );


  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new HiveContext(sc);
    DataFrame df = sqlContext.read().json(getInputPath("input.json"));
    
    df = df.withColumn(
      "previous",
      lag(dataFrame.col("VBB"), 1)
        .over(Window.orderBy(dataFrame.col("VAA")))
      );



> Lag Analytic function broken
> ----------------------------
>
>                 Key: SPARK-10893
>                 URL: https://issues.apache.org/jira/browse/SPARK-10893
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.5.0
>         Environment: Spark Standalone Cluster on Linux
>            Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster.
> Input Jason:
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> Java:
>     SparkContext sc = new SparkContext(conf);
>     HiveContext sqlContext = new HiveContext(sc);
>     DataFrame df = sqlContext.read().json(getInputPath("input.json"));
>     
>     df = df.withColumn(
>       "previous",
>       lag(dataFrame.col("VBB"), 1)
>         .over(Window.orderBy(dataFrame.col("VAA")))
>       );



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org