You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by kunal642 <gi...@git.apache.org> on 2017/11/21 07:31:03 UTC

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

GitHub user kunal642 opened a pull request:

    https://github.com/apache/carbondata/pull/1542

    [CARBONDATA-1757] [PreAgg] Fix for wrong avg values after pre-agg table creation

    when a sum/count aggregation function is applied on the same column along with avg. The plan that was getting transformed was adding 2 columns for sum/count which resulted in wrong data being inserted.
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [X] Any interfaces changed?
     No
     - [X] Any backward compatibility impacted?
     No
     - [X] Document update required?
    No
     - [X] Testing done
    Test case added
     - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kunal642/carbondata pre_agg_avg_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1542.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1542
    
----
commit f05ba4faabc70c19574ed200ddb88909d69cf6e3
Author: kunal642 <ku...@gmail.com>
Date:   2017-11-21T07:25:25Z

    fixed wrong avg count bug when a sum/count aggregation function is applied on the same column along with avg.
    The plan that was getting transformed was adding 2 columns for sum/count which resulted in wrong data being inserted.

----


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    LGTM


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    LGTM


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    retest this please


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1329/



---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1542#discussion_r153068582
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -797,33 +798,59 @@ object CarbonPreInsertionCasts extends Rule[LogicalPlan] {
        * @return
        */
       private def transformAggregatePlan(logicalPlan: LogicalPlan): LogicalPlan = {
    +    val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         logicalPlan transform {
           case aggregate@Aggregate(_, aExp, _) =>
    -        val newExpressions = aExp.flatMap {
    -          case alias@Alias(attrExpression: AggregateExpression, _) =>
    -            attrExpression.aggregateFunction match {
    -              case Average(attr: AttributeReference) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(attr),
    -                    resultId = NamedExpression.newExprId), attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(attr),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case Average(cast@Cast(attr: AttributeReference, _)) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(cast),
    -                    resultId = NamedExpression.newExprId),
    -                  attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(cast),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case _ => Seq(alias)
    -            }
    -          case namedExpr: NamedExpression => Seq(namedExpr)
    +        aExp.foreach {
    +          case alias: Alias =>
    +            validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
    --- End diff --
    
    Is the duplicate columns will be removed from aggexpressions? For example agg table is created with sum(col1) and avg(col1) then aggregation table should be created with sum(col1) and count(col1) only. sum(col1) should not be duplicated. Is this handled here?


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1809/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    LGTM


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1521/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Failed with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/391/



---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1542#discussion_r153995074
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -822,33 +823,59 @@ case class CarbonPreInsertionCasts(sparkSession: SparkSession) extends Rule[Logi
        * @return
        */
       private def transformAggregatePlan(logicalPlan: LogicalPlan): LogicalPlan = {
    +    val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         logicalPlan transform {
           case aggregate@Aggregate(_, aExp, _) =>
    -        val newExpressions = aExp.flatMap {
    -          case alias@Alias(attrExpression: AggregateExpression, _) =>
    -            attrExpression.aggregateFunction match {
    -              case Average(attr: AttributeReference) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(attr),
    -                    resultId = NamedExpression.newExprId), attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(attr),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case Average(cast@MatchCast(attr: AttributeReference, _)) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(cast),
    -                    resultId = NamedExpression.newExprId),
    -                  attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(cast),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case _ => Seq(alias)
    -            }
    -          case namedExpr: NamedExpression => Seq(namedExpr)
    +        aExp.foreach {
    +          case alias: Alias =>
    +            validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
    +          case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
             }
    -        aggregate.copy(aggregateExpressions = newExpressions.asInstanceOf[Seq[NamedExpression]])
    +        aggregate
    +          .copy(aggregateExpressions = validExpressionsMap.values.toSeq)
    --- End diff --
    
    move this to previous line


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    ok


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    @jackylk Please review and merge


---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1542#discussion_r153995458
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -822,33 +823,59 @@ case class CarbonPreInsertionCasts(sparkSession: SparkSession) extends Rule[Logi
        * @return
        */
       private def transformAggregatePlan(logicalPlan: LogicalPlan): LogicalPlan = {
    +    val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         logicalPlan transform {
           case aggregate@Aggregate(_, aExp, _) =>
    -        val newExpressions = aExp.flatMap {
    -          case alias@Alias(attrExpression: AggregateExpression, _) =>
    -            attrExpression.aggregateFunction match {
    -              case Average(attr: AttributeReference) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(attr),
    -                    resultId = NamedExpression.newExprId), attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(attr),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case Average(cast@MatchCast(attr: AttributeReference, _)) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(cast),
    -                    resultId = NamedExpression.newExprId),
    -                  attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(cast),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case _ => Seq(alias)
    -            }
    -          case namedExpr: NamedExpression => Seq(namedExpr)
    +        aExp.foreach {
    +          case alias: Alias =>
    +            validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
    +          case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
             }
    -        aggregate.copy(aggregateExpressions = newExpressions.asInstanceOf[Seq[NamedExpression]])
    +        aggregate
    +          .copy(aggregateExpressions = validExpressionsMap.values.toSeq)
    --- End diff --
    
    ok



---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1542#discussion_r152357935
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala ---
    @@ -489,4 +489,44 @@ object PreAggregateUtil {
         }
         updatedPlan
       }
    +
    --- End diff --
    
    This code is specific for inserting the data to aggregate table better to move this code in rules class 


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    @jackylk build passed


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1399/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1598/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1808/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    retest this please


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1373/



---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1542#discussion_r152474076
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala ---
    @@ -489,4 +489,44 @@ object PreAggregateUtil {
         }
         updatedPlan
       }
    +
    --- End diff --
    
    moved the method to PreAggregationRules


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    retest this please


---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1542


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1923/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1788/



---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1542#discussion_r153995580
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -822,33 +823,59 @@ case class CarbonPreInsertionCasts(sparkSession: SparkSession) extends Rule[Logi
        * @return
        */
       private def transformAggregatePlan(logicalPlan: LogicalPlan): LogicalPlan = {
    +    val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         logicalPlan transform {
           case aggregate@Aggregate(_, aExp, _) =>
    -        val newExpressions = aExp.flatMap {
    -          case alias@Alias(attrExpression: AggregateExpression, _) =>
    -            attrExpression.aggregateFunction match {
    -              case Average(attr: AttributeReference) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(attr),
    -                    resultId = NamedExpression.newExprId), attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(attr),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case Average(cast@MatchCast(attr: AttributeReference, _)) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(cast),
    -                    resultId = NamedExpression.newExprId),
    -                  attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(cast),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case _ => Seq(alias)
    -            }
    -          case namedExpr: NamedExpression => Seq(namedExpr)
    +        aExp.foreach {
    +          case alias: Alias =>
    +            validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
    +          case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
             }
    -        aggregate.copy(aggregateExpressions = newExpressions.asInstanceOf[Seq[NamedExpression]])
    +        aggregate
    +          .copy(aggregateExpressions = validExpressionsMap.values.toSeq)
           case plan: LogicalPlan => plan
         }
       }
    +
    +  /**
    +   * This method will split the avg column into sum and count and will return a sequence of tuple
    +   * of unique name, alias
    +   *
    +   * @param alias
    +   * @return
    --- End diff --
    
    done


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1844/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1385/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1334/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1330/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1836/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1834/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1789/



---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1542#discussion_r153068856
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -797,33 +798,59 @@ object CarbonPreInsertionCasts extends Rule[LogicalPlan] {
        * @return
        */
       private def transformAggregatePlan(logicalPlan: LogicalPlan): LogicalPlan = {
    +    val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         logicalPlan transform {
           case aggregate@Aggregate(_, aExp, _) =>
    -        val newExpressions = aExp.flatMap {
    -          case alias@Alias(attrExpression: AggregateExpression, _) =>
    -            attrExpression.aggregateFunction match {
    -              case Average(attr: AttributeReference) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(attr),
    -                    resultId = NamedExpression.newExprId), attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(attr),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case Average(cast@Cast(attr: AttributeReference, _)) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(cast),
    -                    resultId = NamedExpression.newExprId),
    -                  attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(cast),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case _ => Seq(alias)
    -            }
    -          case namedExpr: NamedExpression => Seq(namedExpr)
    +        aExp.foreach {
    +          case alias: Alias =>
    +            validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
    --- End diff --
    
    Yes, validateAggregateFunctionAndGetAlias function returns a Seq((columnName_aggFunction, Alias)) which is added to a map to remove any duplicate Alias entries.


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1353/



---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1542#discussion_r153995138
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala ---
    @@ -822,33 +823,59 @@ case class CarbonPreInsertionCasts(sparkSession: SparkSession) extends Rule[Logi
        * @return
        */
       private def transformAggregatePlan(logicalPlan: LogicalPlan): LogicalPlan = {
    +    val validExpressionsMap = scala.collection.mutable.LinkedHashMap.empty[String, NamedExpression]
         logicalPlan transform {
           case aggregate@Aggregate(_, aExp, _) =>
    -        val newExpressions = aExp.flatMap {
    -          case alias@Alias(attrExpression: AggregateExpression, _) =>
    -            attrExpression.aggregateFunction match {
    -              case Average(attr: AttributeReference) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(attr),
    -                    resultId = NamedExpression.newExprId), attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(attr),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case Average(cast@MatchCast(attr: AttributeReference, _)) =>
    -                Seq(Alias(attrExpression
    -                  .copy(aggregateFunction = Sum(cast),
    -                    resultId = NamedExpression.newExprId),
    -                  attr.name + "_sum")(),
    -                  Alias(attrExpression
    -                    .copy(aggregateFunction = Count(cast),
    -                      resultId = NamedExpression.newExprId), attr.name + "_count")())
    -              case _ => Seq(alias)
    -            }
    -          case namedExpr: NamedExpression => Seq(namedExpr)
    +        aExp.foreach {
    +          case alias: Alias =>
    +            validExpressionsMap ++= validateAggregateFunctionAndGetAlias(alias)
    +          case namedExpr: NamedExpression => validExpressionsMap.put(namedExpr.name, namedExpr)
             }
    -        aggregate.copy(aggregateExpressions = newExpressions.asInstanceOf[Seq[NamedExpression]])
    +        aggregate
    +          .copy(aggregateExpressions = validExpressionsMap.values.toSeq)
           case plan: LogicalPlan => plan
         }
       }
    +
    +  /**
    +   * This method will split the avg column into sum and count and will return a sequence of tuple
    +   * of unique name, alias
    +   *
    +   * @param alias
    +   * @return
    --- End diff --
    
    remove line 844 and line 845


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1993/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1387/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1354/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1542
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1389/



---