You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by chuxi <gi...@git.apache.org> on 2014/08/21 15:02:35 UTC

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

GitHub user chuxi opened a pull request:

    https://github.com/apache/spark/pull/2082

    SPARK-2096 [SQL]: Correctly parse dot notations for accessing an array of structs

    For example, "arrayOfStruct" is an array of structs and every element of this array has a field called "field1". "arrayOfStruct[0].field1" means to access the value of "field1" for the first element of "arrayOfStruct", but the SQL parser (in sql-core) treats "field1" as an alias. Also, "arrayOfStruct.field1" means to access all values of "field1" in this array of structs and the returns those values as an array. But, the SQL parser cannot resolve it.
    
    I have passed the test case in JsonSuite ("Complex field and type inferring (Ignored)") which is ignored, by a little modified.
    modified test part :
    checkAnswer(
    sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
    (Seq(true, false, null), Seq("str1", null, null)) :: Nil
    )
    However, another question is repeated nested structure is a problem, like arrayOfStruct.field1.arrayOfStruct.field1 or arrayOfStruct[0].field1.arrayOfStruct[0].field1
    I plan to ignore this problem and try to add "select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable where arrayOfStruct.field1==true "
    Besides, my friend anyweil (Wei Li) solved the problem of arrayOfStruct.field1 and its Filter part( means where parsing).
    I am fresh here but will continue working on spark :)
    
    I checked the problem " where arrayOfStruct.field1==true "
    this problem will lead to modify every kind of comparisonExpression. And I think it makes no sense to add this function. So I discard it.
    Over.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chuxi/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2082
    
----
commit b1cb4fb4e3da7ed54ac875afc20a81f25310fa87
Author: chuxi <ch...@163.com>
Date:   2014-08-21T12:47:25Z

    Correctly parse dot notations for accessing an array of structs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53673090
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16540937
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -108,6 +109,8 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] {
             a.dataType match {
               case StructType(fields) =>
                 Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), nestedFields.last)())
    +          case fields :ArrayType =>
    --- End diff --
    
    case ArrayType(fields) leads to an compile error


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53832574
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19444/consoleFull) for   PR 2082 at commit [`e5a3db1`](https://github.com/apache/spark/commit/e5a3db19bf5fc7365e4280f17d9bba27b08d29dd).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi closed the pull request at:

    https://github.com/apache/spark/pull/2082


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-54152206
  
    @marmbrus , do I need to check something else? Or merge the code?
    
    Besides, I see another PR : https://github.com/apache/spark/pull/2230
    
    :) it is my friend, I  suggested him have a look of the nested parquet sqlpaser in the sql parquet test suit which parses dot as a delimiter, not identChar.
    
    it has a problem that we don't know whether the dot should be in identchar or delimiter. It leads to different sql parsing result. If only struct in a json or parquet, apparently put the dot. in identchar is better. Does it have some other reasons about the dot parsing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16691377
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -101,3 +101,44 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
     
       override def toString = s"$child.$fieldName"
     }
    +
    +/**
    + * Returns the value of fields[] in the Struct `child`.
    + * for array of structs
    + */
    +
    +case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression {
    +  type EvaluatedType = Any
    +
    +  def dataType = field.dataType
    +  override def nullable = child.nullable || field.nullable
    +  override def foldable = child.foldable
    +
    +  protected def arrayType = child.dataType match {
    +    case s: ArrayType => s.elementType match {
    +      case t :StructType => t
    +      case otherType => sys.error(s"GetArrayField is not valid on fields of type $otherType")
    +    }
    +    case otherType => sys.error(s"GetArrayField is not valid on fields of type $otherType")
    +  }
    +
    +  lazy val field =
    +    arrayType.fields
    +      .find(_.name == fieldName)
    +      .getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}"))
    +
    +
    +  lazy val ordinal = arrayType.fields.indexOf(field)
    +
    +  override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[ArrayType]
    +
    +  override def eval(input: Row): Any = {
    +    val value : Seq[Row] = child.eval(input).asInstanceOf[Seq[Row]]
    +    val v = value.map{ t =>
    +      if (t == null) null else t(ordinal)
    +    }
    +    v
    +  }
    +
    +  override def toString = s"$child.$fieldName"
    +}
    --- End diff --
    
    End files with a newline.  Run `sbt/sbt scalastyle` to check style locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53363440
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19178/consoleFull) for   PR 2082 at commit [`ebf033b`](https://github.com/apache/spark/commit/ebf033bfb2a658d4ca25cfdbe4d7def105793486).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16691505
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -101,3 +101,44 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
     
       override def toString = s"$child.$fieldName"
     }
    +
    +/**
    + * Returns the value of fields[] in the Struct `child`.
    --- End diff --
    
    Perhaps: "Returns an array containing the value of `fieldName` for each element in the input array of type struct".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16539375
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -108,6 +109,8 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] {
             a.dataType match {
               case StructType(fields) =>
                 Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), nestedFields.last)())
    +          case fields :ArrayType =>
    --- End diff --
    
    Maybe it is better to use `case ArrayType(fields)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16691386
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -101,3 +101,44 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
     
       override def toString = s"$child.$fieldName"
     }
    +
    +/**
    + * Returns the value of fields[] in the Struct `child`.
    + * for array of structs
    + */
    +
    +case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression {
    +  type EvaluatedType = Any
    +
    +  def dataType = field.dataType
    +  override def nullable = child.nullable || field.nullable
    +  override def foldable = child.foldable
    +
    +  protected def arrayType = child.dataType match {
    +    case s: ArrayType => s.elementType match {
    +      case t :StructType => t
    --- End diff --
    
    Always put the `:` next to the variable, not the type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-54694419
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16969910
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -101,3 +101,41 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
     
       override def toString = s"$child.$fieldName"
     }
    +
    +/**
    + * Returns an array containing the value of fieldName
    + * for each element in the input array of type struct
    + */
    +case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression {
    +  type EvaluatedType = Any
    +
    +  def dataType = field.dataType
    +  override def nullable = child.nullable || field.nullable
    +  override def foldable = child.foldable
    +
    +  protected def arrayType = child.dataType match {
    +    case ArrayType(s: StructType, _) => s
    +    case otherType => sys.error(s"GetArrayField is not valid on fields of type $otherType")
    +  }
    +
    +  lazy val field = if (arrayType.isInstanceOf[StructType]) {
    +    arrayType.fields
    +      .find(_.name == fieldName)
    +      .getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}"))
    +  } else null
    +
    +
    +  lazy val ordinal = arrayType.fields.indexOf(field)
    +
    +  override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[ArrayType]
    +
    +  override def eval(input: Row): Any = {
    +    val value : Seq[Row] = child.eval(input).asInstanceOf[Seq[Row]]
    +    val v = value.map{ t =>
    +      if (t == null) null else t(ordinal)
    +    }
    +    v
    --- End diff --
    
    = =


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53828175
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16881896
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -101,3 +101,41 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
     
       override def toString = s"$child.$fieldName"
     }
    +
    +/**
    + * Returns an array containing the value of fieldName
    + * for each element in the input array of type struct
    + */
    +case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression {
    +  type EvaluatedType = Any
    +
    +  def dataType = field.dataType
    +  override def nullable = child.nullable || field.nullable
    +  override def foldable = child.foldable
    +
    +  protected def arrayType = child.dataType match {
    +    case ArrayType(s: StructType, _) => s
    +    case otherType => sys.error(s"GetArrayField is not valid on fields of type $otherType")
    +  }
    +
    +  lazy val field = if (arrayType.isInstanceOf[StructType]) {
    +    arrayType.fields
    +      .find(_.name == fieldName)
    +      .getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}"))
    +  } else null
    +
    +
    +  lazy val ordinal = arrayType.fields.indexOf(field)
    +
    +  override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[ArrayType]
    +
    +  override def eval(input: Row): Any = {
    +    val value : Seq[Row] = child.eval(input).asInstanceOf[Seq[Row]]
    +    val v = value.map{ t =>
    +      if (t == null) null else t(ordinal)
    +    }
    +    v
    --- End diff --
    
    you can just use 
    
        value.map{ t =>
          if (t == null) null else t(ordinal)
        }
    
    as the last line of this `eval` function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16691550
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -101,3 +101,44 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
     
       override def toString = s"$child.$fieldName"
     }
    +
    +/**
    + * Returns the value of fields[] in the Struct `child`.
    + * for array of structs
    + */
    +
    +case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression {
    +  type EvaluatedType = Any
    +
    +  def dataType = field.dataType
    +  override def nullable = child.nullable || field.nullable
    +  override def foldable = child.foldable
    +
    +  protected def arrayType = child.dataType match {
    +    case s: ArrayType => s.elementType match {
    +      case t :StructType => t
    +      case otherType => sys.error(s"GetArrayField is not valid on fields of type $otherType")
    +    }
    +    case otherType => sys.error(s"GetArrayField is not valid on fields of type $otherType")
    +  }
    +
    +  lazy val field =
    +    arrayType.fields
    +      .find(_.name == fieldName)
    +      .getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}"))
    +
    +
    +  lazy val ordinal = arrayType.fields.indexOf(field)
    +
    +  override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[ArrayType]
    --- End diff --
    
    This should also check that the element type of the ArrayType is StructType and that the requested field name can be found in that struct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16540895
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -108,6 +109,8 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] {
             a.dataType match {
               case StructType(fields) =>
                 Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), nestedFields.last)())
    +          case fields :ArrayType =>
    --- End diff --
    
    In fact, ArrayType(fields, _) would be better 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53363532
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19178/consoleFull) for   PR 2082 at commit [`ebf033b`](https://github.com/apache/spark/commit/ebf033bfb2a658d4ca25cfdbe4d7def105793486).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression `
      * `case class ExplainCommand(plan: LogicalPlan, extended: Boolean = false) extends Command `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-52916269
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16692668
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
    @@ -373,8 +376,13 @@ class SqlLexical(val keywords: Seq[String]) extends StdLexical {
       )
     
       override lazy val token: Parser[Token] = (
    -    identChar ~ rep( identChar | digit ) ^^
    -      { case first ~ rest => processIdent(first :: rest mkString "") }
    +       identChar ~ rep( identChar | digit ) ^^
    --- End diff --
    
    indent too much


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53363068
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53363057
  
    Thanks for working on this.  I made a few small suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16541208
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala ---
    @@ -292,24 +292,29 @@ class JsonSuite extends QueryTest {
           sql("select structWithArrayFields.field1[1], structWithArrayFields.field2[3] from jsonTable"),
           (5, null) :: Nil
         )
    -  }
     
    -  ignore("Complex field and type inferring (Ignored)") {
    -    val jsonSchemaRDD = jsonRDD(complexFieldAndType)
    -    jsonSchemaRDD.registerTempTable("jsonTable")
    +    checkAnswer(
    +      sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
    +      (Seq(true, false, null), Seq("str1", null, null)) :: Nil
    +    )
     
    -    // Right now, "field1" and "field2" are treated as aliases. We should fix it.
         checkAnswer(
           sql("select arrayOfStruct[0].field1, arrayOfStruct[0].field2 from jsonTable"),
           (true, "str1") :: Nil
         )
     
    -    // Right now, the analyzer cannot resolve arrayOfStruct.field1 and arrayOfStruct.field2.
    -    // Getting all values of a specific field from an array of structs.
    +  }
    +
    +  ignore("Complex field and type inferring (Ignored)") {
    +    val jsonSchemaRDD = jsonRDD(complexFieldAndType)
    +    jsonSchemaRDD.registerTempTable("jsonTable")
    +
    +    // still need add filter??? I am not sure whether this function is necessary. quite complex
         checkAnswer(
    -      sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
    -      (Seq(true, false), Seq("str1", null)) :: Nil
    +      sql("select arrayOfStruct.field1 from jsonTable where arrayOfStruct.field1 = true"),
    --- End diff --
    
    I add sql("select arrayOfStruct.field1 from jsonTable where arrayOfStruct.field1 = true") this test case in ignored part. It does not work because I came up with it but did not solve it. Or it makes no sense to solve it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-54252376
  
    Hi @chuxi, it would be great if you could discuss with @cloud-fan and perhaps adapt your `GetArrayField` stuff to work with the changes in #2230.  Also I think there are a few unaddressed comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16539291
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala ---
    @@ -292,24 +292,29 @@ class JsonSuite extends QueryTest {
           sql("select structWithArrayFields.field1[1], structWithArrayFields.field2[3] from jsonTable"),
           (5, null) :: Nil
         )
    -  }
     
    -  ignore("Complex field and type inferring (Ignored)") {
    -    val jsonSchemaRDD = jsonRDD(complexFieldAndType)
    -    jsonSchemaRDD.registerTempTable("jsonTable")
    +    checkAnswer(
    +      sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
    +      (Seq(true, false, null), Seq("str1", null, null)) :: Nil
    +    )
     
    -    // Right now, "field1" and "field2" are treated as aliases. We should fix it.
         checkAnswer(
           sql("select arrayOfStruct[0].field1, arrayOfStruct[0].field2 from jsonTable"),
           (true, "str1") :: Nil
         )
     
    -    // Right now, the analyzer cannot resolve arrayOfStruct.field1 and arrayOfStruct.field2.
    -    // Getting all values of a specific field from an array of structs.
    +  }
    +
    +  ignore("Complex field and type inferring (Ignored)") {
    +    val jsonSchemaRDD = jsonRDD(complexFieldAndType)
    +    jsonSchemaRDD.registerTempTable("jsonTable")
    +
    +    // still need add filter??? I am not sure whether this function is necessary. quite complex
         checkAnswer(
    -      sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
    -      (Seq(true, false), Seq("str1", null)) :: Nil
    +      sql("select arrayOfStruct.field1 from jsonTable where arrayOfStruct.field1 = true"),
    --- End diff --
    
    why are you changing the test case since it still cannot work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53828495
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19444/consoleFull) for   PR 2082 at commit [`e5a3db1`](https://github.com/apache/spark/commit/e5a3db19bf5fc7365e4280f17d9bba27b08d29dd).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16691566
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -94,6 +94,7 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] {
         // matches the name or where the first part matches the scope and the second part matches the
         // name.  Return these matches along with any remaining parts, which represent dotted access to
         // struct fields.
    +
    --- End diff --
    
    Remove this newline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16691419
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -101,3 +101,44 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
     
       override def toString = s"$child.$fieldName"
     }
    +
    +/**
    + * Returns the value of fields[] in the Struct `child`.
    + * for array of structs
    + */
    +
    +case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression {
    +  type EvaluatedType = Any
    +
    +  def dataType = field.dataType
    +  override def nullable = child.nullable || field.nullable
    +  override def foldable = child.foldable
    +
    +  protected def arrayType = child.dataType match {
    +    case s: ArrayType => s.elementType match {
    --- End diff --
    
    this could be written as:
    
    ```scala
    child.dataType match {
      case ArrayType(s: StructType, _) => s
      case otherType => sys.error(s"GetArrayField is not valid on fields of type $otherType")
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi commented on the pull request:

    https://github.com/apache/spark/pull/2082#issuecomment-53386890
  
    Thank you, marmbrus, you are so nice. I am fresh here and never post any PR to a open project. I will take your suggestions and modify my code as the scala style. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16881901
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -101,3 +101,41 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
     
       override def toString = s"$child.$fieldName"
     }
    +
    +/**
    + * Returns an array containing the value of fieldName
    + * for each element in the input array of type struct
    + */
    +case class GetArrayField(child: Expression, fieldName: String) extends UnaryExpression {
    +  type EvaluatedType = Any
    +
    +  def dataType = field.dataType
    +  override def nullable = child.nullable || field.nullable
    +  override def foldable = child.foldable
    +
    +  protected def arrayType = child.dataType match {
    +    case ArrayType(s: StructType, _) => s
    +    case otherType => sys.error(s"GetArrayField is not valid on fields of type $otherType")
    +  }
    +
    +  lazy val field = if (arrayType.isInstanceOf[StructType]) {
    +    arrayType.fields
    +      .find(_.name == fieldName)
    +      .getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}"))
    +  } else null
    +
    +
    +  lazy val ordinal = arrayType.fields.indexOf(field)
    +
    +  override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[ArrayType]
    +
    +  override def eval(input: Row): Any = {
    +    val value : Seq[Row] = child.eval(input).asInstanceOf[Seq[Row]]
    --- End diff --
    
    remove the space after value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

Posted by chuxi <gi...@git.apache.org>.

Github user chuxi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2082#discussion_r16540807
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala ---
    @@ -292,24 +292,29 @@ class JsonSuite extends QueryTest {
           sql("select structWithArrayFields.field1[1], structWithArrayFields.field2[3] from jsonTable"),
           (5, null) :: Nil
         )
    -  }
     
    -  ignore("Complex field and type inferring (Ignored)") {
    -    val jsonSchemaRDD = jsonRDD(complexFieldAndType)
    -    jsonSchemaRDD.registerTempTable("jsonTable")
    +    checkAnswer(
    +      sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
    +      (Seq(true, false, null), Seq("str1", null, null)) :: Nil
    +    )
     
    -    // Right now, "field1" and "field2" are treated as aliases. We should fix it.
         checkAnswer(
           sql("select arrayOfStruct[0].field1, arrayOfStruct[0].field2 from jsonTable"),
           (true, "str1") :: Nil
         )
     
    -    // Right now, the analyzer cannot resolve arrayOfStruct.field1 and arrayOfStruct.field2.
    -    // Getting all values of a specific field from an array of structs.
    +  }
    +
    +  ignore("Complex field and type inferring (Ignored)") {
    +    val jsonSchemaRDD = jsonRDD(complexFieldAndType)
    +    jsonSchemaRDD.registerTempTable("jsonTable")
    +
    +    // still need add filter??? I am not sure whether this function is necessary. quite complex
         checkAnswer(
    -      sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
    -      (Seq(true, false), Seq("str1", null)) :: Nil
    +      sql("select arrayOfStruct.field1 from jsonTable where arrayOfStruct.field1 = true"),
    --- End diff --
    
    wang pangzi, someone add field3 in testData arrayOfStruct. So it requires another null. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org