You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/03/25 13:00:15 UTC

[GitHub] spark pull request: Implement the RLike & Like in catalyst

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/224

    Implement the RLike & Like in catalyst

    This PR includes:
    1) Unify the unit test for expression evaluation
    2) Add implementation of RLike & Like

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark string_expression

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #224
    
----
commit 2c8929e6b9fe7f9d8ed38af12d0474bd9b0bbd13
Author: Cheng Hao <ha...@intel.com>
Date:   2014-03-25T02:35:22Z

    Update the unit test for expression evaluation

commit 91cfd3375f4603e94944952f2b6bc2b8c5d4468e
Author: Cheng Hao <ha...@intel.com>
Date:   2014-03-25T07:20:43Z

    add implementation for rlike/like
    
    Conflicts:
    
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
    	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38770652
  
    @marmbrus , I've updated the code as the comments. However, I am not sure if that necessary to update the data type casting in HiveTypeCoercion if the parameter is not string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38747116
  
    add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-39010804
  
    Thanks. I merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38881849
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11007251
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -19,11 +19,113 @@ package org.apache.spark.sql
     package catalyst
     package expressions
     
    +import java.util.regex.Pattern
    +
    +import org.apache.spark.sql.catalyst.types.DataType
    +import org.apache.spark.sql.catalyst.types.StringType
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import org.apache.spark.sql.catalyst.trees.TreeNode
    +import org.apache.spark.sql.catalyst.errors.`package`.TreeNodeException
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    +  self: BinaryExpression =>
    +
    +  type EvaluatedType = Any
    +  
    +  def escape(v: String): String
    +  def nullable: Boolean = true
    +  def dataType: DataType = BooleanType
    +  
    +  // try cache the pattern for Literal 
    +  private lazy val cache: Pattern = right match {
    +    case x @ Literal(value: String, StringType) => compile(value)
    +    case _ => null
    +  }
    +  
    +  protected def compile(str: Any): Pattern = str match {
    +    // TODO or let it be null if couldn't compile the regex?
    +    case x: String if(x != null) => Pattern.compile(escape(x))
    +    case x: String => null
    +    case _ => throw new InvalidRegExException(this, "$str can not be compiled to regex pattern")
    --- End diff --
    
    Ah, I was also thinking of the case where the regex itself is invalid, but it looks like Hive is going to fail here too.
    
    ```scala
    [marmbrus@michaels-mbp spark (javaSchemaRDD)]$ sbt hive/console
    [info] Starting scala interpreter...
    [info] 
    import org.apache.spark.sql.catalyst.analysis._
    import org.apache.spark.sql.catalyst.dsl._
    import org.apache.spark.sql.catalyst.errors._
    import org.apache.spark.sql.catalyst.expressions._
    import org.apache.spark.sql.catalyst.plans.logical._
    import org.apache.spark.sql.catalyst.rules._
    import org.apache.spark.sql.catalyst.types._
    import org.apache.spark.sql.catalyst.util._
    import org.apache.spark.sql.execution
    import org.apache.spark.sql.hive._
    import org.apache.spark.sql.hive.TestHive._
    import org.apache.spark.sql.parquet.ParquetTestData
    Welcome to Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45).
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> TestHive.runSqlHive("SELECT 'a' RLIKE '**' FROM src LIMIT 1")
    
    ======================
    HIVE FAILURE OUTPUT
    ======================
    set javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/private/var/folders/36/cjkbrr953xg2p_krwrmn8h_r0000gn/T/sparkHiveMetastore5920310799452446901;create=true
    set hive.metastore.warehouse.dir=/private/var/folders/36/cjkbrr953xg2p_krwrmn8h_r0000gn/T/sparkHiveWarehouse505596429372573669
    OK
    Copying data from file:/Users/marmbrus/workspace/hive/data/files/kv1.txt
    Copying file: file:/Users/marmbrus/workspace/hive/data/files/kv1.txt
    Loading data to table default.src
    Table default.src stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
    OK
    FAILED: ParseException line 1:0 cannot recognize input near 'test' '<EOF>' '<EOF>'
    
    FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments ''**'': org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.BooleanWritable org.apache.hadoop.hive.ql.udf.UDFRegExp.evaluate(org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)  on object org.apache.hadoop.hive.ql.udf.UDFRegExp@37348663 of class org.apache.hadoop.hive.ql.udf.UDFRegExp with arguments {a:org.apache.hadoop.io.Text, **:org.apache.hadoop.io.Text} of size 2
    
    ======================
    END HIVE FAILURE OUTPUT
    ======================
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38642776
  
    Jenkins, this this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r10928332
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -20,10 +20,114 @@ package catalyst
     package expressions
     
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import java.util.regex.Pattern
     
    -case class Like(left: Expression, right: Expression) extends BinaryExpression {
    -  def dataType = BooleanType
    -  def nullable = left.nullable // Right cannot be null.
    +import catalyst.types.StringType
    +import catalyst.types.BooleanType
    +import catalyst.trees.TreeNode
    +
    +import catalyst.errors.`package`.TreeNodeException
    +import org.apache.spark.sql.catalyst.types.DataType
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    +  self: BinaryExpression =>
    +
    +  type EvaluatedType = Any
    --- End diff --
    
    Ah, forgot that evaluation result can be `null`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002658
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -26,7 +26,186 @@ import org.apache.spark.sql.catalyst.types._
     /* Implicit conversions */
     import org.apache.spark.sql.catalyst.dsl.expressions._
     
    -class ExpressionEvaluationSuite extends FunSuite {
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.spark.sql.catalyst.dsl._
    +
    +
    +/**
    + * Root class of expression evaluation test
    + */
    +trait ExprEvalTest {
    +  type Execution = (Row => Row)
    +
    +  def engine: Execution
    +}
    +
    +case class InterpretExprEvalTest(exprs: Seq[Expression]) extends ExprEvalTest {
    +  override def engine: Execution = new Projection(exprs)
    --- End diff --
    
    I think we probably do not need to double check all expressions in the context of an interpreted projection and by calling the evaluation function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38556716
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38760284
  
    Thank you @marmbrus  for so detailed comments, I will separate them as 2 PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38583453
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002521
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -19,11 +19,113 @@ package org.apache.spark.sql
     package catalyst
     package expressions
     
    +import java.util.regex.Pattern
    +
    +import org.apache.spark.sql.catalyst.types.DataType
    +import org.apache.spark.sql.catalyst.types.StringType
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import org.apache.spark.sql.catalyst.trees.TreeNode
    +import org.apache.spark.sql.catalyst.errors.`package`.TreeNodeException
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    --- End diff --
    
    Please add scala doc to this trait and the classes below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002570
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -26,7 +26,186 @@ import org.apache.spark.sql.catalyst.types._
     /* Implicit conversions */
     import org.apache.spark.sql.catalyst.dsl.expressions._
     
    -class ExpressionEvaluationSuite extends FunSuite {
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.spark.sql.catalyst.dsl._
    +
    +
    +/**
    + * Root class of expression evaluation test
    + */
    +trait ExprEvalTest {
    --- End diff --
    
    `ExpressionEvaluationTest`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002505
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -19,11 +19,113 @@ package org.apache.spark.sql
     package catalyst
     package expressions
     
    +import java.util.regex.Pattern
    +
    +import org.apache.spark.sql.catalyst.types.DataType
    +import org.apache.spark.sql.catalyst.types.StringType
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import org.apache.spark.sql.catalyst.trees.TreeNode
    +import org.apache.spark.sql.catalyst.errors.`package`.TreeNodeException
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    +  self: BinaryExpression =>
    +
    +  type EvaluatedType = Any
    +  
    +  def escape(v: String): String
    +  def nullable: Boolean = true
    +  def dataType: DataType = BooleanType
    +  
    +  // try cache the pattern for Literal 
    +  private lazy val cache: Pattern = right match {
    +    case x @ Literal(value: String, StringType) => compile(value)
    +    case _ => null
    +  }
    +  
    +  protected def compile(str: Any): Pattern = str match {
    +    // TODO or let it be null if couldn't compile the regex?
    +    case x: String if(x != null) => Pattern.compile(escape(x))
    +    case x: String => null
    +    case _ => throw new InvalidRegExException(this, "$str can not be compiled to regex pattern")
    +  }
    +  
    +  protected def pattern(str: String) = if(cache == null) compile(str) else cache
    +  
    +  protected def filter: PartialFunction[(Row, (String, String)), Any] = {
    +    case (row, (null, r)) => { false }
    --- End diff --
    
    A few comments on this function:
     - I think you are already checking for null values below, so these cases will never match.
     - In match statements you can use _ to denote values where you do not need to check the value or use it.
     - I'm not sure why this is a partial function.  Since you are lifting the partial function and using `get` to retrieve the result, it ends up being equivalent to just using a `match` statement, which would be syntactically much simpler.
     - The use of tuples as arguments to functions should be discouraged.  Where there is a very clear `(key,value)` relationship or when they are only used locally it is maybe okay, but here it is very difficult to trace what the arguments to this function are supposed to be.
     - Given the above, I'd remove this PartialFunction and just embed the last case in the `apply` function.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002548
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -19,11 +19,113 @@ package org.apache.spark.sql
     package catalyst
     package expressions
     
    +import java.util.regex.Pattern
    +
    +import org.apache.spark.sql.catalyst.types.DataType
    +import org.apache.spark.sql.catalyst.types.StringType
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import org.apache.spark.sql.catalyst.trees.TreeNode
    +import org.apache.spark.sql.catalyst.errors.`package`.TreeNodeException
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    +  self: BinaryExpression =>
    +
    +  type EvaluatedType = Any
    +  
    +  def escape(v: String): String
    +  def nullable: Boolean = true
    +  def dataType: DataType = BooleanType
    +  
    +  // try cache the pattern for Literal 
    +  private lazy val cache: Pattern = right match {
    +    case x @ Literal(value: String, StringType) => compile(value)
    +    case _ => null
    +  }
    +  
    +  protected def compile(str: Any): Pattern = str match {
    +    // TODO or let it be null if couldn't compile the regex?
    +    case x: String if(x != null) => Pattern.compile(escape(x))
    +    case x: String => null
    +    case _ => throw new InvalidRegExException(this, "$str can not be compiled to regex pattern")
    --- End diff --
    
    Is the right thing to do throwing an exception, or should we just return null?  I'm not sure what Hive's semantics are.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11007623
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -26,7 +26,186 @@ import org.apache.spark.sql.catalyst.types._
     /* Implicit conversions */
     import org.apache.spark.sql.catalyst.dsl.expressions._
     
    -class ExpressionEvaluationSuite extends FunSuite {
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.spark.sql.catalyst.dsl._
    +
    +
    +/**
    + * Root class of expression evaluation test
    + */
    +trait ExprEvalTest {
    +  type Execution = (Row => Row)
    +
    +  def engine: Execution
    +}
    +
    +case class InterpretExprEvalTest(exprs: Seq[Expression]) extends ExprEvalTest {
    +  override def engine: Execution = new Projection(exprs)
    +}
    +
    +class InterpretExpressionEvaluationSuite extends ExpressionEvaluationSuite {
    +  override def executor(exprs: Seq[Expression]) = InterpretExprEvalTest(exprs)
    +}
    +
    +trait ExpressionEvaluationSuite extends FunSuite {
    +  /**
    +   * The sub classes need to create the ExprEvalTest object 
    +   */
    +  def executor(exprs: Seq[Expression]): ExprEvalTest
    +  
    +  val data: Row = new GenericRow(Array(1, null, 1.0, true, 4, 5, null, "abcccd", "a%"))
    +
    +  // TODO add to DSL
    +  val c1 = BoundReference(0, AttributeReference("a", IntegerType)())
    +  val c2 = BoundReference(1, AttributeReference("b", IntegerType)())
    +  val c3 = BoundReference(2, AttributeReference("c", DoubleType)())
    +  val c4 = BoundReference(3, AttributeReference("d", BooleanType)())
    +  val c5 = BoundReference(4, AttributeReference("e", IntegerType)())
    +  val c6 = BoundReference(5, AttributeReference("f", IntegerType)())
    +  val c7 = BoundReference(6, AttributeReference("g", StringType)())
    +  val c8 = BoundReference(7, AttributeReference("h", StringType)())
    +  val c9 = BoundReference(8, AttributeReference("i", StringType)())
    +  
    +  /**
    +   * Compare each of the field if it equals the expected value.
    +   * 
    +   * expected is a sequence of (Any, Any), 
    +   * and the first element indicates:
    +   *   true:  the expected value is field is null
    +   *   false: the expected value is not null
    +   *   Exception Class: the expected exception class while computing the value 
    --- End diff --
    
    In the expression evaluation unit tests, some of the exceptions raised as expected, during the implementation, I was thinking to keep the Expressions creation / Expected Result / Computing independently in the simple framework design, because the expected exceptions are assumed to be defined dynamically by later, hence catching the exception first and then compare it with the expected one maybe more easier. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r10928264
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -20,10 +20,114 @@ package catalyst
     package expressions
     
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import java.util.regex.Pattern
     
    -case class Like(left: Expression, right: Expression) extends BinaryExpression {
    -  def dataType = BooleanType
    -  def nullable = left.nullable // Right cannot be null.
    +import catalyst.types.StringType
    +import catalyst.types.BooleanType
    +import catalyst.trees.TreeNode
    +
    +import catalyst.errors.`package`.TreeNodeException
    +import org.apache.spark.sql.catalyst.types.DataType
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    +  self: BinaryExpression =>
    +
    +  type EvaluatedType = Any
    +  
    +  def escape(v: String): String
    +  def nullable: Boolean = true
    +  def dataType: DataType = BooleanType
    +  
    +  // try cache the pattern for Literal 
    +  private lazy val cache: Pattern = right match {
    +    case x @ Literal(value: String, StringType) => compile(value)
    +    case _ => null
    +  }
    +  
    +  protected def compile(str: Any): Pattern = str match {
    +    // TODO or let it be null if couldn't compile the regex?
    +    case x: String if(x != null) => Pattern.compile(escape(x))
    +    case x: String => null
    +    case _ => throw new InvalidRegExException(this, "$str can not be compiled to regex pattern")
    +  }
    +  
    +  protected def pattern(str: String) = if(cache == null) compile(str) else cache
    +  
    +  protected def filter: PartialFunction[(Row, (String, String)), Any] = {
    --- End diff --
    
    Return type of the partial function can be `Boolean`, same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002785
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -26,7 +26,186 @@ import org.apache.spark.sql.catalyst.types._
     /* Implicit conversions */
     import org.apache.spark.sql.catalyst.dsl.expressions._
     
    -class ExpressionEvaluationSuite extends FunSuite {
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.spark.sql.catalyst.dsl._
    +
    +
    +/**
    + * Root class of expression evaluation test
    + */
    +trait ExprEvalTest {
    +  type Execution = (Row => Row)
    +
    +  def engine: Execution
    +}
    +
    +case class InterpretExprEvalTest(exprs: Seq[Expression]) extends ExprEvalTest {
    +  override def engine: Execution = new Projection(exprs)
    +}
    +
    +class InterpretExpressionEvaluationSuite extends ExpressionEvaluationSuite {
    +  override def executor(exprs: Seq[Expression]) = InterpretExprEvalTest(exprs)
    +}
    +
    +trait ExpressionEvaluationSuite extends FunSuite {
    +  /**
    +   * The sub classes need to create the ExprEvalTest object 
    +   */
    +  def executor(exprs: Seq[Expression]): ExprEvalTest
    +  
    +  val data: Row = new GenericRow(Array(1, null, 1.0, true, 4, 5, null, "abcccd", "a%"))
    +
    +  // TODO add to DSL
    +  val c1 = BoundReference(0, AttributeReference("a", IntegerType)())
    +  val c2 = BoundReference(1, AttributeReference("b", IntegerType)())
    +  val c3 = BoundReference(2, AttributeReference("c", DoubleType)())
    +  val c4 = BoundReference(3, AttributeReference("d", BooleanType)())
    +  val c5 = BoundReference(4, AttributeReference("e", IntegerType)())
    +  val c6 = BoundReference(5, AttributeReference("f", IntegerType)())
    +  val c7 = BoundReference(6, AttributeReference("g", StringType)())
    +  val c8 = BoundReference(7, AttributeReference("h", StringType)())
    +  val c9 = BoundReference(8, AttributeReference("i", StringType)())
    +  
    +  /**
    +   * Compare each of the field if it equals the expected value.
    +   * 
    +   * expected is a sequence of (Any, Any), 
    +   * and the first element indicates:
    +   *   true:  the expected value is field is null
    --- End diff --
    
    Instead of having a separate value for null, why not just pass null around?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002824
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -26,7 +26,186 @@ import org.apache.spark.sql.catalyst.types._
     /* Implicit conversions */
     import org.apache.spark.sql.catalyst.dsl.expressions._
     
    -class ExpressionEvaluationSuite extends FunSuite {
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.spark.sql.catalyst.dsl._
    +
    +
    +/**
    + * Root class of expression evaluation test
    + */
    +trait ExprEvalTest {
    +  type Execution = (Row => Row)
    +
    +  def engine: Execution
    +}
    +
    +case class InterpretExprEvalTest(exprs: Seq[Expression]) extends ExprEvalTest {
    +  override def engine: Execution = new Projection(exprs)
    +}
    +
    +class InterpretExpressionEvaluationSuite extends ExpressionEvaluationSuite {
    +  override def executor(exprs: Seq[Expression]) = InterpretExprEvalTest(exprs)
    +}
    +
    +trait ExpressionEvaluationSuite extends FunSuite {
    +  /**
    +   * The sub classes need to create the ExprEvalTest object 
    +   */
    +  def executor(exprs: Seq[Expression]): ExprEvalTest
    +  
    +  val data: Row = new GenericRow(Array(1, null, 1.0, true, 4, 5, null, "abcccd", "a%"))
    +
    +  // TODO add to DSL
    +  val c1 = BoundReference(0, AttributeReference("a", IntegerType)())
    +  val c2 = BoundReference(1, AttributeReference("b", IntegerType)())
    +  val c3 = BoundReference(2, AttributeReference("c", DoubleType)())
    +  val c4 = BoundReference(3, AttributeReference("d", BooleanType)())
    +  val c5 = BoundReference(4, AttributeReference("e", IntegerType)())
    +  val c6 = BoundReference(5, AttributeReference("f", IntegerType)())
    +  val c7 = BoundReference(6, AttributeReference("g", StringType)())
    +  val c8 = BoundReference(7, AttributeReference("h", StringType)())
    +  val c9 = BoundReference(8, AttributeReference("i", StringType)())
    +  
    +  /**
    +   * Compare each of the field if it equals the expected value.
    +   * 
    +   * expected is a sequence of (Any, Any), 
    +   * and the first element indicates:
    +   *   true:  the expected value is field is null
    +   *   false: the expected value is not null
    +   *   Exception Class: the expected exception class while computing the value 
    --- End diff --
    
    Scala test already has functionality for ensuring the correct exceptions are thrown, I think we should just reuse that instead of inventing our own.
    
    ```scala
    intercept[ExpectedException] { operation }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002751
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -26,7 +26,186 @@ import org.apache.spark.sql.catalyst.types._
     /* Implicit conversions */
     import org.apache.spark.sql.catalyst.dsl.expressions._
     
    -class ExpressionEvaluationSuite extends FunSuite {
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.spark.sql.catalyst.dsl._
    +
    +
    +/**
    + * Root class of expression evaluation test
    + */
    +trait ExprEvalTest {
    +  type Execution = (Row => Row)
    +
    +  def engine: Execution
    +}
    +
    +case class InterpretExprEvalTest(exprs: Seq[Expression]) extends ExprEvalTest {
    +  override def engine: Execution = new Projection(exprs)
    +}
    +
    +class InterpretExpressionEvaluationSuite extends ExpressionEvaluationSuite {
    +  override def executor(exprs: Seq[Expression]) = InterpretExprEvalTest(exprs)
    +}
    +
    +trait ExpressionEvaluationSuite extends FunSuite {
    +  /**
    +   * The sub classes need to create the ExprEvalTest object 
    +   */
    +  def executor(exprs: Seq[Expression]): ExprEvalTest
    +  
    +  val data: Row = new GenericRow(Array(1, null, 1.0, true, 4, 5, null, "abcccd", "a%"))
    +
    +  // TODO add to DSL
    +  val c1 = BoundReference(0, AttributeReference("a", IntegerType)())
    +  val c2 = BoundReference(1, AttributeReference("b", IntegerType)())
    +  val c3 = BoundReference(2, AttributeReference("c", DoubleType)())
    +  val c4 = BoundReference(3, AttributeReference("d", BooleanType)())
    +  val c5 = BoundReference(4, AttributeReference("e", IntegerType)())
    +  val c6 = BoundReference(5, AttributeReference("f", IntegerType)())
    +  val c7 = BoundReference(6, AttributeReference("g", StringType)())
    +  val c8 = BoundReference(7, AttributeReference("h", StringType)())
    +  val c9 = BoundReference(8, AttributeReference("i", StringType)())
    +  
    +  /**
    +   * Compare each of the field if it equals the expected value.
    +   * 
    +   * expected is a sequence of (Any, Any), 
    +   * and the first element indicates:
    +   *   true:  the expected value is field is null
    +   *   false: the expected value is not null
    +   *   Exception Class: the expected exception class while computing the value 
    +   * the second element is the real value when first element equals false(not null)
    +   */
    +  def verify(expected: Seq[(Any, Any)], result: Row, input: Row) {
    +    Seq.tabulate(expected.size) { i =>
    +      expected(i) match {
    +        case (false, expected) => {
    +          assert(result.isNullAt(i) == false, 
    +            s"Input:($input), Output field:$i shouldn't be null")
    +
    +          val real = result.apply(i)
    +          assert(real == expected, 
    +            s"Input:($input), Output field:$i is expected as $expected, but got $real")
    +        }
    +        case (true, _) => {
    +          assert(result.isNullAt(i) == true, s"Input:($input), Output field:$i is expected as null")
    +        }
    +        case (exception: Class[_], _) => {
    +          assert(result.isNullAt(i) == false, 
    +            s"Input:($input), Output field:$i should be exception")
    +
    +          val real = result.apply(i).getClass.getName
    +          val expect = exception.getName
    +          assert(real == expect, 
    +            s"Input:($input), Output field:$i expect exception $expect, but got $real")
    +        }
    +      }
    +    }
    +  }
    +
    +  def verify(expecteds: Seq[Seq[(Any, Any)]], results: Seq[Row], inputs: Seq[Row]) {
    +    Range(0, expecteds.length).foreach { i =>
    +      verify(expecteds(i), results(i), inputs(i))
    +    }
    +  }
    +  
    +  def proc(tester: ExprEvalTest, input: Row): Row = {
    +    try {
    +      tester.engine.apply(input)
    +    } catch {
    +      case x: Any => {
    +        new GenericRow(Array(x.asInstanceOf[Any]))
    +      }
    +    }
    +  }
    +  
    +  def run(exprs: Seq[Expression], expected: Seq[(Any, Any)], input: Row) {
    +    val tester = executor(exprs)
    +    
    +    verify(expected, proc(tester,input), input)
    +  }
    +  
    +  def run(exprs: Seq[Expression], expecteds: Seq[Seq[(Any, Any)]], inputs: Seq[Row]) {
    +    val tester = executor(exprs)
    +    
    +    verify(expecteds, inputs.map(proc(tester,_)), inputs)
    +  }
    +  
    +  test("logical") {
    +    val expected = Seq[(Boolean, Any)](
    +        (false, false), 
    +        (true, -1), 
    +        (false, true), 
    +        (false, true), 
    +        (false, false))
    +
    +    val exprs = Seq[Expression](And(LessThan(Cast(c1, DoubleType), c3), LessThan(c1, c2)), 
    +      Or(LessThan(Cast(c1, DoubleType), c3), LessThan(c1, c2)),
    +      IsNull(c2),
    +      IsNotNull(c3),
    +      Not(c4))
    +    
    +    run(exprs, expected, data)
    +  }
    +  
    +  test("arithmetic") {
    +    val exprs = Array[Expression](
    +      Add(c1, c2),
    +      Add(c1, c5),
    +      Divide(c1, c5),
    +      Subtract(c1, c5),
    +      Multiply(c1, c5),
    +      Remainder(c1, c5),
    +      UnaryMinus(c1)
    +    )
    +    val expecteds = Seq[(Boolean, Any)](
    +        (true, 0), 
    +        (false, 5), 
    +        (false, 0), 
    +        (false, -3), 
    +        (false, 4),
    +        (false, 1),
    +        (false, -1))
    +
    +    run(exprs, expecteds, data)
    +  }
    +
    +  test("string like / rlike") {
    +    val exprs = Seq(
    +      Like(c7, Literal("a", StringType)),
    --- End diff --
    
    The way these are structured the answer for a given test case ends up being 16 lines away from the test case itself, making it harder to read than it needs to be.  I think sequences of tuples makes sense for something like a truth table, but in this case I think something much simpler would be much easier for people to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11007057
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -19,11 +19,113 @@ package org.apache.spark.sql
     package catalyst
     package expressions
     
    +import java.util.regex.Pattern
    +
    +import org.apache.spark.sql.catalyst.types.DataType
    +import org.apache.spark.sql.catalyst.types.StringType
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import org.apache.spark.sql.catalyst.trees.TreeNode
    +import org.apache.spark.sql.catalyst.errors.`package`.TreeNodeException
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    +  self: BinaryExpression =>
    +
    +  type EvaluatedType = Any
    +  
    +  def escape(v: String): String
    +  def nullable: Boolean = true
    +  def dataType: DataType = BooleanType
    +  
    +  // try cache the pattern for Literal 
    +  private lazy val cache: Pattern = right match {
    +    case x @ Literal(value: String, StringType) => compile(value)
    +    case _ => null
    +  }
    +  
    +  protected def compile(str: Any): Pattern = str match {
    +    // TODO or let it be null if couldn't compile the regex?
    +    case x: String if(x != null) => Pattern.compile(escape(x))
    +    case x: String => null
    +    case _ => throw new InvalidRegExException(this, "$str can not be compiled to regex pattern")
    --- End diff --
    
    The parameter type is String(Text) of the HIVE UDFLike.evaluate, which means the UDF initialization will fail if the wrong type passed in. I put an exception throwing here just to guarantee we passed the right type. You are right, it also will throw exception if mis-match case happens, or it maybe never happens if the HiveTypeCoercion always does the right type casting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11008485
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -19,11 +19,113 @@ package org.apache.spark.sql
     package catalyst
     package expressions
     
    +import java.util.regex.Pattern
    +
    +import org.apache.spark.sql.catalyst.types.DataType
    +import org.apache.spark.sql.catalyst.types.StringType
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import org.apache.spark.sql.catalyst.trees.TreeNode
    +import org.apache.spark.sql.catalyst.errors.`package`.TreeNodeException
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    +  self: BinaryExpression =>
    +
    +  type EvaluatedType = Any
    +  
    +  def escape(v: String): String
    +  def nullable: Boolean = true
    +  def dataType: DataType = BooleanType
    +  
    +  // try cache the pattern for Literal 
    +  private lazy val cache: Pattern = right match {
    +    case x @ Literal(value: String, StringType) => compile(value)
    +    case _ => null
    +  }
    +  
    +  protected def compile(str: Any): Pattern = str match {
    +    // TODO or let it be null if couldn't compile the regex?
    +    case x: String if(x != null) => Pattern.compile(escape(x))
    +    case x: String => null
    +    case _ => throw new InvalidRegExException(this, "$str can not be compiled to regex pattern")
    --- End diff --
    
    Yes, but in some of the database system, it results "null".
    Anyway, thank you for the checking, I think we should follow Hive, I will remove the TODO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11007398
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -26,7 +26,186 @@ import org.apache.spark.sql.catalyst.types._
     /* Implicit conversions */
     import org.apache.spark.sql.catalyst.dsl.expressions._
     
    -class ExpressionEvaluationSuite extends FunSuite {
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.spark.sql.catalyst.dsl._
    +
    +
    +/**
    + * Root class of expression evaluation test
    + */
    +trait ExprEvalTest {
    +  type Execution = (Row => Row)
    +
    +  def engine: Execution
    +}
    +
    +case class InterpretExprEvalTest(exprs: Seq[Expression]) extends ExprEvalTest {
    +  override def engine: Execution = new Projection(exprs)
    --- End diff --
    
    Yes, you're right, actually what we need now is just the single Expression. 
    But I put the projection who contains multiple expressions/fields, because:
    1) We may have the case which need to computing multiple fields for a single input row. (Of cause that can be replaced by multiple expressions with different row feeding.)
    2) In the future expression evaluation optimization, I am thinking to CSE(Common Sub expression Elimination), the Projection may hide the details in each expression computing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-39003245
  
    @chenghao-intel,  Thanks for doing this!  One small note, for future contributions please make a JIRA to make it easier to track all of the things that are in flight.
    
    @pwendell I'm not sure why jenkins doesn't seem to want to test this PR.  It is passing travis though and doesn't touch spark core, so I'm okay to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r10928230
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -20,10 +20,114 @@ package catalyst
     package expressions
     
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import java.util.regex.Pattern
     
    -case class Like(left: Expression, right: Expression) extends BinaryExpression {
    -  def dataType = BooleanType
    -  def nullable = left.nullable // Right cannot be null.
    +import catalyst.types.StringType
    +import catalyst.types.BooleanType
    +import catalyst.trees.TreeNode
    +
    +import catalyst.errors.`package`.TreeNodeException
    +import org.apache.spark.sql.catalyst.types.DataType
    +
    +
    +/**
    + * Thrown when an invalid RegEx string is found.
    + */
    +class InvalidRegExException[TreeType <: TreeNode[_]](tree: TreeType, reason: String) extends
    +  errors.TreeNodeException(tree, s"$reason", null)
    +
    +trait StringRegexExpression {
    +  self: BinaryExpression =>
    +
    +  type EvaluatedType = Any
    --- End diff --
    
    `EvaluatedType` can be set to `Boolean`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38748839
  
    Hi Cheng Hao,  Thanks for implementing these functions.  I bet they will be much faster than using HiveUDFs as we do now!
    
    Based on the number of comments on the test case refactoring, I think it would be easiest to try and commit just the LIKE and RLIKE with a few simple test cases, and then take on the rest in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r11002606
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -26,7 +26,186 @@ import org.apache.spark.sql.catalyst.types._
     /* Implicit conversions */
     import org.apache.spark.sql.catalyst.dsl.expressions._
     
    -class ExpressionEvaluationSuite extends FunSuite {
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.spark.sql.catalyst.dsl._
    +
    +
    +/**
    + * Root class of expression evaluation test
    + */
    +trait ExprEvalTest {
    +  type Execution = (Row => Row)
    +
    +  def engine: Execution
    +}
    +
    +case class InterpretExprEvalTest(exprs: Seq[Expression]) extends ExprEvalTest {
    +  override def engine: Execution = new Projection(exprs)
    +}
    +
    +class InterpretExpressionEvaluationSuite extends ExpressionEvaluationSuite {
    +  override def executor(exprs: Seq[Expression]) = InterpretExprEvalTest(exprs)
    +}
    +
    +trait ExpressionEvaluationSuite extends FunSuite {
    +  /**
    +   * The sub classes need to create the ExprEvalTest object 
    +   */
    +  def executor(exprs: Seq[Expression]): ExprEvalTest
    +  
    +  val data: Row = new GenericRow(Array(1, null, 1.0, true, 4, 5, null, "abcccd", "a%"))
    +
    +  // TODO add to DSL
    +  val c1 = BoundReference(0, AttributeReference("a", IntegerType)())
    +  val c2 = BoundReference(1, AttributeReference("b", IntegerType)())
    +  val c3 = BoundReference(2, AttributeReference("c", DoubleType)())
    +  val c4 = BoundReference(3, AttributeReference("d", BooleanType)())
    +  val c5 = BoundReference(4, AttributeReference("e", IntegerType)())
    +  val c6 = BoundReference(5, AttributeReference("f", IntegerType)())
    +  val c7 = BoundReference(6, AttributeReference("g", StringType)())
    +  val c8 = BoundReference(7, AttributeReference("h", StringType)())
    +  val c9 = BoundReference(8, AttributeReference("i", StringType)())
    +  
    +  /**
    +   * Compare each of the field if it equals the expected value.
    +   * 
    +   * expected is a sequence of (Any, Any), 
    --- End diff --
    
    Use `@param` to document what parameters are used for.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/224#discussion_r10928654
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -20,10 +20,114 @@ package catalyst
     package expressions
     
     import org.apache.spark.sql.catalyst.types.BooleanType
    +import java.util.regex.Pattern
     
    -case class Like(left: Expression, right: Expression) extends BinaryExpression {
    -  def dataType = BooleanType
    -  def nullable = left.nullable // Right cannot be null.
    +import catalyst.types.StringType
    --- End diff --
    
    As specified in [the Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports), please use full qualified package imports here, and reorder all the imports accordingly. Spark SQL used to use relative package imports, but has already been fixed in an earlier PR :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/224


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38890552
  
    Thanks you @marmbrus for providing the sample code of unit tests in email, that make the unit test code quite clean. :)
    
    And I also fixed bug which led the failures in the regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Implement the RLike & Like in catalyst

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/224#issuecomment-38642112
  
    thanks @liancheng , @marmbrus 
    
    I've update some of the code style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---