You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/03/31 10:38:36 UTC

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/275

    [SPARK-1360] Add Timestamp Support for SQL

    This PR includes:
    1) Add new data type Timestamp 
    2) Add more data type casting base on Hive's Rule
    3) Fix bug missing data type in both parsers (HiveQl & SQLParser).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark timestamp

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #275
    
----
commit 9ac2d52deca4cd71b5da3bdabd948e55fcb9f4d3
Author: Cheng Hao <ha...@intel.com>
Date:   2014-03-31T07:05:10Z

    Add TimestampType Support

commit 98d344af13a795e4ec6c7f3277357c68a92f3b5c
Author: Cheng Hao <ha...@intel.com>
Date:   2014-03-31T08:05:17Z

    Add DataType for SqlParser

commit c01d4855eb5d0f374c4ec12b48b0734fdde2c392
Author: Cheng Hao <ha...@intel.com>
Date:   2014-03-31T08:23:09Z

    Update Unit Test & CodeStyle

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39218075
  
    BTW, the whitelist has been reordered (via sort command of linux shell) after adding more passed cases, and actually more cases would be added like decimal_2 / decimal_3, however, the precision part of decimal still couldn't be exactly match, leave it for further improvement. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39273152
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39301253
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39300410
  
    Can you start a new test please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/275#discussion_r11130544
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
    @@ -122,6 +122,16 @@ class SqlParser extends StandardTokenParsers {
       protected val RIGHT = Keyword("RIGHT")
       protected val SELECT = Keyword("SELECT")
       protected val STRING = Keyword("STRING")
    +  protected val FLOAT = Keyword("FLOAT")
    +  protected val SMALLINT = Keyword("SMALLINT")
    +  protected val TINYINT = Keyword("TINYINT")
    --- End diff --
    
    I think maybe for TINYINT and SMALLINT we should just call them by their more obvious java names BYTE and SHORT.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39290051
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39261163
  
    add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39403313
  
    @chenghao-intel three final things and then we can merge this!  It needs to be rebased as I don't think it merges cleanly anymore (I also fixed the missing datatypes in ScalaReflection).  I also added a test case for this code.  Please add Timestamp to that.  Finally, can you roll back all the spurious changes to the machine dependent golden files?  You only need to add the ones that are new.
    
    This should do it:
    ```
    git checkout HEAD sql/hive/src/test/resources/golden
    sbt hive/test
    git status
    <add new files>
    ```
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39508675
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39308189
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13677/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39292938
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13665/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39268899
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/275#discussion_r11127869
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -17,6 +17,9 @@
     
     package org.apache.spark.sql.catalyst.expressions
     
    +import java.sql.Timestamp
    +import java.lang.{NumberFormatException => NFE} 
    --- End diff --
    
    Can we not shorten this? I am not sure if NFE is very obvious to readers


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/275


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39515688
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13737/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39301237
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39290278
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39272800
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39292937
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39273704
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39515686
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39290270
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/275#discussion_r11185998
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -26,52 +28,169 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression {
       override def toString = s"CAST($child, $dataType)"
     
       type EvaluatedType = Any
    +  
    +  def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) {
    +    null
    +  } else {
    +    func(a.asInstanceOf[T])
    +  }
     
    -  lazy val castingFunction: Any => Any = (child.dataType, dataType) match {
    -    case (BinaryType, StringType) => a: Any => new String(a.asInstanceOf[Array[Byte]])
    -    case (StringType, BinaryType) => a: Any => a.asInstanceOf[String].getBytes
    -    case (_, StringType) => a: Any => a.toString
    -    case (StringType, IntegerType) => a: Any => castOrNull(a, _.toInt)
    -    case (StringType, DoubleType) => a: Any => castOrNull(a, _.toDouble)
    -    case (StringType, FloatType) => a: Any => castOrNull(a, _.toFloat)
    -    case (StringType, LongType) => a: Any => castOrNull(a, _.toLong)
    -    case (StringType, ShortType) => a: Any => castOrNull(a, _.toShort)
    -    case (StringType, ByteType) => a: Any => castOrNull(a, _.toByte)
    -    case (StringType, DecimalType) => a: Any => castOrNull(a, BigDecimal(_))
    -    case (BooleanType, ByteType) => {
    -      case null => null
    -      case true => 1.toByte
    -      case false => 0.toByte
    -    }
    -    case (dt, IntegerType) =>
    -      a: Any => dt.asInstanceOf[NumericType].numeric.asInstanceOf[Numeric[Any]].toInt(a)
    -    case (dt, DoubleType) =>
    -      a: Any => dt.asInstanceOf[NumericType].numeric.asInstanceOf[Numeric[Any]].toDouble(a)
    -    case (dt, FloatType) =>
    -      a: Any => dt.asInstanceOf[NumericType].numeric.asInstanceOf[Numeric[Any]].toFloat(a)
    -    case (dt, LongType) =>
    -      a: Any => dt.asInstanceOf[NumericType].numeric.asInstanceOf[Numeric[Any]].toLong(a)
    -    case (dt, ShortType) =>
    -      a: Any => dt.asInstanceOf[NumericType].numeric.asInstanceOf[Numeric[Any]].toInt(a).toShort
    -    case (dt, ByteType) =>
    -      a: Any => dt.asInstanceOf[NumericType].numeric.asInstanceOf[Numeric[Any]].toInt(a).toByte
    -    case (dt, DecimalType) =>
    -      a: Any =>
    -        BigDecimal(dt.asInstanceOf[NumericType].numeric.asInstanceOf[Numeric[Any]].toDouble(a))
    +  // UDFToString
    +  def castToString: Any => Any = child.dataType match {
    +    case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8"))
    +    case _ => nullOrCast[Any](_, _.toString)
    +  }
    +  
    +  // BinaryConverter
    +  def castToBinary: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, _.getBytes("UTF-8"))
       }
     
    -  @inline
    -  protected def castOrNull[A](a: Any, f: String => A) =
    -    try f(a.asInstanceOf[String]) catch {
    -      case _: java.lang.NumberFormatException => null
    -    }
    +  // UDFToBoolean
    +  def castToBoolean: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, _.length() != 0)
    +    case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 || b.getNanos() != 0)})
    +    case LongType => nullOrCast[Long](_, _ != 0)
    +    case IntegerType => nullOrCast[Int](_, _ != 0)
    +    case ShortType => nullOrCast[Short](_, _ != 0)
    +    case ByteType => nullOrCast[Byte](_, _ != 0)
    +    case DecimalType => nullOrCast[BigDecimal](_, _ != 0)
    +    case DoubleType => nullOrCast[Double](_, _ != 0)
    +    case FloatType => nullOrCast[Float](_, _ != 0)
    +  }
    +  
    +  // TimestampConverter
    +  def castToTimestamp: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, s => {
    +      // Throw away extra if more than 9 decimal places
    +      val periodIdx = s.indexOf(".");
    +      var n = s
    +      if (periodIdx != -1) {
    +        if (n.length() - periodIdx > 9) {
    +          n = n.substring(0, periodIdx + 10)
    +        }
    +      }
    +      try Timestamp.valueOf(n) catch { case _: java.lang.IllegalArgumentException => null}
    +    })
    +    case BooleanType => nullOrCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 1000))
    +    case LongType => nullOrCast[Long](_, l => new Timestamp(l * 1000))
    +    case IntegerType => nullOrCast[Int](_, i => new Timestamp(i * 1000))
    +    case ShortType => nullOrCast[Short](_, s => new Timestamp(s * 1000))
    +    case ByteType => nullOrCast[Byte](_, b => new Timestamp(b * 1000))
    +    // TimestampWritable.decimalToTimestamp
    +    case DecimalType => nullOrCast[BigDecimal](_, d => decimalToTimestamp(d))
    +    // TimestampWritable.doubleToTimestamp
    +    case DoubleType => nullOrCast[Double](_, d => decimalToTimestamp(d))
    +    // TimestampWritable.floatToTimestamp
    +    case FloatType => nullOrCast[Float](_, f => decimalToTimestamp(f))
    +  }
    +
    +  private def decimalToTimestamp(d: BigDecimal) = {
    +    val seconds = d.longValue()
    +    val bd = (d - seconds) * (1000000000)
    +    val nanos = bd.intValue()
    +
    +    // Convert to millis
    +    val millis = seconds * 1000
    +    val t = new Timestamp(millis)
    +
    +    // remaining fractional portion as nanos
    +    t.setNanos(nanos)
    +    
    +    t
    +  }
    +
    +  private def timestampToDouble(t: Timestamp) = (t.getSeconds() + t.getNanos().toDouble / 1000)
    +
    +  def castToLong: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, s => try s.toLong catch {
    +      case _: NumberFormatException => null
    +    })
    +    case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0)
    +    case TimestampType => nullOrCast[Timestamp](_, t => timestampToDouble(t).toLong)
    +    case DecimalType => nullOrCast[BigDecimal](_, _.toLong)
    +    case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toLong(b)
    +  }
    +
    +  def castToInt: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, s => try s.toInt catch {
    +      case _: NumberFormatException => null
    +    })
    +    case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0)
    +    case TimestampType => nullOrCast[Timestamp](_, t => timestampToDouble(t).toInt)
    +    case DecimalType => nullOrCast[BigDecimal](_, _.toInt)
    +    case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b)
    +  }
    +
    +  def castToShort: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, s => try s.toShort catch {
    +      case _: NumberFormatException => null
    +    })
    +    case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0)
    +    case TimestampType => nullOrCast[Timestamp](_, t => timestampToDouble(t).toShort)
    +    case DecimalType => nullOrCast[BigDecimal](_, _.toShort)
    +    case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toShort
    +  }
    +
    +  def castToByte: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, s => try s.toByte catch {
    +      case _: NumberFormatException => null
    +    })
    +    case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0)
    +    case TimestampType => nullOrCast[Timestamp](_, t => timestampToDouble(t).toByte)
    +    case DecimalType => nullOrCast[BigDecimal](_, _.toByte)
    +    case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toByte
    +  }
    +
    +  def castToDecimal: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, s => try BigDecimal(s.toDouble) catch {
    +      case _: NumberFormatException => null
    +    })
    +    case BooleanType => nullOrCast[Boolean](_, b => if(b) BigDecimal(1) else BigDecimal(0))
    +    case TimestampType => nullOrCast[Timestamp](_, t => BigDecimal(timestampToDouble(t)))
    +    case x: NumericType => b => BigDecimal(x.numeric.asInstanceOf[Numeric[Any]].toDouble(b))
    +  }
    +
    +  def castToDouble: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, s => try s.toDouble catch {
    +      case _: NumberFormatException => null
    +    })
    +    case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0)
    +    case TimestampType => nullOrCast[Timestamp](_, t => timestampToDouble(t))
    +    case DecimalType => nullOrCast[BigDecimal](_, _.toDouble)
    +    case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toDouble(b)
    +  }
    +
    +  def castToFloat: Any => Any = child.dataType match {
    +    case StringType => nullOrCast[String](_, s => try s.toFloat catch {
    +      case _: NumberFormatException => null
    +    })
    +    case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0)
    +    case TimestampType => nullOrCast[Timestamp](_, t => timestampToDouble(t).toFloat)
    +    case DecimalType => nullOrCast[BigDecimal](_, _.toFloat)
    +    case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toFloat(b)
    +  }
    +
    +  def cast: Any => Any = dataType match {
    +    case StringType => castToString
    +    case BinaryType => castToBinary
    +    case DecimalType => castToDecimal
    +    case TimestampType => castToTimestamp
    +    case BooleanType => castToBoolean
    +    case ByteType => castToByte
    +    case ShortType => castToShort
    +    case IntegerType => castToInt
    +    case FloatType => castToFloat
    +    case LongType => castToLong
    +    case DoubleType => castToDouble
    +  }
     
       override def apply(input: Row): Any = {
         val evaluated = child.apply(input)
         if (evaluated == null) {
           null
         } else {
    -      castingFunction(evaluated)
    +      if(child.dataType == dataType) evaluated else cast(evaluated)
    --- End diff --
    
    There should already be a rule that eliminates casts that don't do anything, so I think this check is unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39273036
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39483020
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39412166
  
    thank you @marmbrus, I've done the final things, I think it's ready to be merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39509146
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39268972
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39501240
  
    This LGTM as soon as we can get Jenkins to agree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/275#discussion_r11134927
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
    @@ -103,11 +113,38 @@ package object dsl {
           def expr = attr
           def attr = analysis.UnresolvedAttribute(s)
     
    +      /** Creates a new typed attributes of type boolean */
    --- End diff --
    
    This was wrong before but can we make it say "Creates a new AttributeReference of type X"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39290231
  
    sorry, wait a minute, I am updating the golden files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39513293
  
    merged. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39138169
  
    Overall, I think this looks pretty good.  After you figure out the failing tests, you should also whitelist a bunch of tests that were only failing because we didn't have timestamp support.
    
    I got the list below by running `sbt/sbt -Dspark.hive.alltests hive/test` (this runs all tests even if they aren't on the whitelist) and looking in `sql/hive/target/HiveCompatibilitySuite.passed`.  You should also check out the logs in `sql/hive/target/HiveCompatibilitySuite.failed` and `sql/hive/target/HiveCompatibilitySuite.wrong` to make sure we aren't missing any edge cases regarding timestamps.
    
    ```
    +    "input14",
    +    "input21",
    +    "input_testsequencefile",
    +    "insert1",
    +    "insert2_overwrite_partitions",
    +    "join32_lessSize",
    +    "join_map_ppr",
    +    "join_rc",
    +    "lateral_view_outer",
    +    "loadpart1",
    +    "mapreduce1",
    +    "mapreduce2",
    +    "mapreduce4",
    +    "mapreduce5",
    +    "mapreduce6",
    +    "mapreduce8",
    +    "multi_insert_gby",
    +    "multi_insert_gby3",
    +    "multi_insert_lateral_view",
    +    "orc_dictionary_threshold",
    +    "orc_empty_files",
    +    "orc_ends_with_nulls",
    +    "parallel",
    +    "parenthesis_star_by",
    +    "partcols1",
    +    "partition_serde_format",
    +    "partition_wise_fileformat4",
    +    "partition_wise_fileformat5",
    +    "partition_wise_fileformat6",
    +    "partition_wise_fileformat7",
    +    "partition_wise_fileformat9",
    +    "ppd2",
    +    "ppd_clusterby",
    +    "ppd_constant_expr",
    +    "ppd_transform",
    +    "rcfile_columnar",
    +    "rcfile_lazydecompress",
    +    "rcfile_null_value",
    +    "rcfile_toleratecorruptions",
    +    "rcfile_union",
    +    "reduce_deduplicate",
    +    "reduce_deduplicate_exclude_gby",
    +    "reducesink_dedup",
    +    "smb_mapjoin_6",
    +    "smb_mapjoin_7",
    +    "stats_aggregator_error_1",
    +    "stats_publisher_error_1",
    +    "transform_ppr1",
    +    "transform_ppr2",
    +    "udaf_histogram_numeric",
    +    "udf8",
    +    "union3",
    +    "union33",
    +    "union_remove_11",
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39273047
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39270714
  
    This is looking pretty good!  A few small comments:
     - You should also add Timestamp to `ScalaReflection`.
     - You will need to check in any new golden files (if there are any) that were created in `sql/hive/src/test/resources/`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39300850
  
    Jenkins, retest this please,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39216306
  
    Thank you @marmbrus , @rxin , both code and unittest whitelist have been updated and passed the unit test in my local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39412197
  
    thank you @marmbrus, I've done the final things, I think it's ready to be merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/275#discussion_r11134685
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
    @@ -122,6 +122,16 @@ class SqlParser extends StandardTokenParsers {
       protected val RIGHT = Keyword("RIGHT")
       protected val SELECT = Keyword("SELECT")
       protected val STRING = Keyword("STRING")
    +  protected val FLOAT = Keyword("FLOAT")
    +  protected val SMALLINT = Keyword("SMALLINT")
    +  protected val TINYINT = Keyword("TINYINT")
    --- End diff --
    
    Okay, I talked with Reynold a little bit more about this.  We aren't really sure where we want to go with our own SQL dialect so for now can we just leave this change out.  We want to avoid having to support things long term before we decide exactly how we are going to do things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39100400
  
    Sorry, some bugs in the unit test, I will look at those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39308188
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39065478
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39509138
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/275#discussion_r11130666
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -170,6 +170,34 @@ abstract class Expression extends TreeNode[Expression] {
           }
         }
       }
    +  
    +  @inline
    +  protected final def c2(
    --- End diff --
    
    I set a bad precedence here with short names and no scaladoc.  Can you at least add scala doc to this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39501049
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1360] Add Timestamp Support for SQL

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/275#issuecomment-39273153
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13653/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---