You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2015/04/01 03:03:48 UTC

[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/5303

    [SPARK-6638] [SQL] Improve performance of StringType in SQL

    This PR change the internal representation for StringType from java.lang.String to UTF8String, which is implemented use Array[Byte] (encoded in UTF-8).
    
    This PR should not break any public API, Row.getString() will still return java.lang.String.
    
    This is the first step of improve the performance of String in SQL.
    
    cc @rxin

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark string

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5303
    
----
commit 685fd071ce453cc6b956f98c897c869ad31702a9
Author: Davies Liu <da...@databricks.com>
Date:   2015-03-31T05:42:07Z

    use UTF8String instead of String for StringType

commit 21f67c6fda3504caa0b13524d4e498c6e4c9c701
Author: Davies Liu <da...@databricks.com>
Date:   2015-03-31T07:50:11Z

    cleanup

commit 4699c3ae1dab6482b26dd3d3739193e68cd77ca3
Author: Davies Liu <da...@databricks.com>
Date:   2015-03-31T20:46:42Z

    use Array[Byte] in UTF8String

commit d32abd1e8e6b7b5ef92a34a5d3a42919db58a43c
Author: Davies Liu <da...@databricks.com>
Date:   2015-03-31T20:57:17Z

    fix utf8 for python api

commit a85fb275d742dd9384e15f22878b545e9a77a106
Author: Davies Liu <da...@databricks.com>
Date:   2015-03-31T23:42:18Z

    refactor

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89099952
  
      [Test build #29637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29637/consoleFull) for   PR 5303 at commit [`e5fa5b8`](https://github.com/apache/spark/commit/e5fa5b824b62670b8ba76399831ddd0e9e25efa5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88770278
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29586/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89050512
  
      [Test build #29624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29624/consoleFull) for   PR 5303 at commit [`9f4c194`](https://github.com/apache/spark/commit/9f4c194cbc44a3b163a6f2d3e1f7f831518d7776).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88594996
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29552/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27538841
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/dataTypes.scala ---
    @@ -386,6 +386,13 @@ class BinaryType private() extends NativeType with PrimitiveType {
           }
           x.length - y.length
         }
    +    override def equiv(x: Array[Byte], y: Array[Byte]): Boolean = {
    --- End diff --
    
    what is this used for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88369619
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29522/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27538725
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala ---
    @@ -62,7 +62,11 @@ object IntegerLiteral {
       }
     }
     
    -case class Literal(value: Any, dataType: DataType) extends LeafExpression {
    +case class Literal(var value: Any, dataType: DataType) extends LeafExpression {
    --- End diff --
    
    we should create a object Literal and add an apply function that converts string into UTF8String, rather than making this a var.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89235413
  
      [Test build #635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/635/consoleFull) for   PR 5303 at commit [`e5fa5b8`](https://github.com/apache/spark/commit/e5fa5b824b62670b8ba76399831ddd0e9e25efa5).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88615126
  
      [Test build #29557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29557/consoleFull) for   PR 5303 at commit [`dbfa1ed`](https://github.com/apache/spark/commit/dbfa1eddd980dedd59f29455c6dc56170b946950).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88639702
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29560/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88594994
  
      [Test build #29552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29552/consoleFull) for   PR 5303 at commit [`c7dd4d2`](https://github.com/apache/spark/commit/c7dd4d285bc08313f98334225435e77ca0f79bb9).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal(var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch **removes the following dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.1.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.6.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88375070
  
      [Test build #29527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29527/consoleFull) for   PR 5303 at commit [`38c303e`](https://github.com/apache/spark/commit/38c303ede6a96157158a1624472a9f92035289a0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89071799
  
      [Test build #29624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29624/consoleFull) for   PR 5303 at commit [`9f4c194`](https://github.com/apache/spark/commit/9f4c194cbc44a3b163a6f2d3e1f7f831518d7776).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89083922
  
      [Test build #29633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29633/consoleFull) for   PR 5303 at commit [`28d6f32`](https://github.com/apache/spark/commit/28d6f32eda151ed51f35117eb5beb1ec6b6882d1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89120147
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29634/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by davies <gi...@git.apache.org>.
Github user davies closed the pull request at:

    https://github.com/apache/spark/pull/5303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89206709
  
      [Test build #635 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/635/consoleFull) for   PR 5303 at commit [`e5fa5b8`](https://github.com/apache/spark/commit/e5fa5b824b62670b8ba76399831ddd0e9e25efa5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88616216
  
      [Test build #29557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29557/consoleFull) for   PR 5303 at commit [`dbfa1ed`](https://github.com/apache/spark/commit/dbfa1eddd980dedd59f29455c6dc56170b946950).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal(var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89079309
  
      [Test build #29625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29625/consoleFull) for   PR 5303 at commit [`537631c`](https://github.com/apache/spark/commit/537631c09c5167dc8b13785b5aa678d5c984d73a).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.
     * This patch **adds the following new dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.1.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.6.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88395447
  
      [Test build #29527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29527/consoleFull) for   PR 5303 at commit [`38c303e`](https://github.com/apache/spark/commit/38c303ede6a96157158a1624472a9f92035289a0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal(var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88373259
  
      [Test build #29526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29526/consoleFull) for   PR 5303 at commit [`5f9e120`](https://github.com/apache/spark/commit/5f9e1207a3896cb779d17b4aff6c58961e54b827).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal(var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89231316
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29658/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88671717
  
      [Test build #29570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29570/consoleFull) for   PR 5303 at commit [`23a766c`](https://github.com/apache/spark/commit/23a766cf8eb9142ef641a7eb69f136a817d8fb32).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal (var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27538867
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/UTF8StringSuite.scala ---
    @@ -0,0 +1,61 @@
    +/*
    +* Licensed to the Apache Software Foundation (ASF) under one or more
    +* contributor license agreements.  See the NOTICE file distributed with
    +* this work for additional information regarding copyright ownership.
    +* The ASF licenses this file to You under the Apache License, Version 2.0
    +* (the "License"); you may not use this file except in compliance with
    +* the License.  You may obtain a copy of the License at
    +*
    +*    http://www.apache.org/licenses/LICENSE-2.0
    +*
    +* Unless required by applicable law or agreed to in writing, software
    +* distributed under the License is distributed on an "AS IS" BASIS,
    +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +* See the License for the specific language governing permissions and
    +* limitations under the License.
    +*/
    +
    +package org.apache.spark.sql.types
    +
    +import org.scalatest.FunSuite
    +
    +class UTF8StringSuite extends FunSuite {
    +  test("basic") {
    +    def check(str: String, len: Int) {
    +
    +      assert(UTF8String(str).length == len)
    +      assert(UTF8String(str.getBytes("utf8")).length() == len)
    +
    +      assert(UTF8String(str) == str)
    +      assert(UTF8String(str.getBytes("utf8")) == str)
    +      assert(UTF8String(str.getBytes("utf8")) == UTF8String(str))
    +
    +      assert(UTF8String(str).hashCode() == UTF8String(str.getBytes("utf8")).hashCode())
    +    }
    +
    +    check("hello", 5)
    +    check("世 界", 3)
    +  }
    +
    +  test("contains, prefix and suffix") {
    --- End diff --
    
    let's break these into 3 different test cases


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89104370
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29632/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89079350
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29625/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27710824
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -80,6 +82,24 @@ trait ScalaReflection {
         case (other, _) => other
       }
     
    +  /**
    +   *  Converts Scala objects to catalyst rows / types.
    +   *  Note: This should be called before do evaluation on Row
    +   *        (It does not support UDT)
    +   */
    +  def convertToCatalyst(a: Any): Any = a match {
    --- End diff --
    
    was this function somewhere before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89125753
  
      [Test build #29637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29637/consoleFull) for   PR 5303 at commit [`e5fa5b8`](https://github.com/apache/spark/commit/e5fa5b824b62670b8ba76399831ddd0e9e25efa5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89107865
  
      [Test build #29633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29633/consoleFull) for   PR 5303 at commit [`28d6f32`](https://github.com/apache/spark/commit/28d6f32eda151ed51f35117eb5beb1ec6b6882d1).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27618887
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratedMutableEvaluationSuite.scala ---
    @@ -44,14 +43,6 @@ class GeneratedMutableEvaluationSuite extends ExpressionEvaluationSuite {
     
         val actual = plan(inputRow)
         val expectedRow = new GenericRow(Array[Any](expected))
    -    if (actual.hashCode() != expectedRow.hashCode()) {
    --- End diff --
    
    The hashCode of UTF8String is different than String


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89091627
  
    @rxin @marmbrus This PR is ready to review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88369609
  
      [Test build #29522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29522/consoleFull) for   PR 5303 at commit [`6b499ac`](https://github.com/apache/spark/commit/6b499ac13528f8062e1e81ddb0cd462975960067).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal(var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89104314
  
      [Test build #29632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29632/consoleFull) for   PR 5303 at commit [`ccaf78e`](https://github.com/apache/spark/commit/ccaf78e2e3b9014dbcf5395fb986656734bbf2dc).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88366980
  
      [Test build #29522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29522/consoleFull) for   PR 5303 at commit [`6b499ac`](https://github.com/apache/spark/commit/6b499ac13528f8062e1e81ddb0cd462975960067).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89091745
  
      [Test build #29634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29634/consoleFull) for   PR 5303 at commit [`28f3d81`](https://github.com/apache/spark/commit/28f3d817456f7d25a50b9e47b376bd4b252e1d19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88671734
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29570/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88590706
  
      [Test build #29552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29552/consoleFull) for   PR 5303 at commit [`c7dd4d2`](https://github.com/apache/spark/commit/c7dd4d285bc08313f98334225435e77ca0f79bb9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88616223
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29557/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88662129
  
      [Test build #29566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29566/consoleFull) for   PR 5303 at commit [`8b45864`](https://github.com/apache/spark/commit/8b458644a269d9e1542a2a4ab24d735aa95ed49d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal(var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch **removes the following dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.1.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.6.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27619122
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala ---
    @@ -62,7 +75,10 @@ object IntegerLiteral {
       }
     }
     
    -case class Literal(value: Any, dataType: DataType) extends LeafExpression {
    +case class Literal (var value: Any, dataType: DataType) extends LeafExpression {
    --- End diff --
    
    There are 400-500 places using this constructor, I'd like to leave the refactor out for another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89071829
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29624/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89125772
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29637/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27711090
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificMutableRow.scala ---
    @@ -230,13 +247,15 @@ final class SpecificMutableRow(val values: Array[MutableValue]) extends MutableR
         new GenericRow(newValues)
       }
     
    -  override def update(ordinal: Int, value: Any): Unit = {
    -    if (value == null) setNullAt(ordinal) else values(ordinal).update(value)
    +  override def update(ordinal: Int, value: Any): Unit = value match {
    +    case null => setNullAt(ordinal)
    +    case s: String => update(ordinal, UTF8String(s))
    --- End diff --
    
    do we ever have a string here? since this is used internally by spark sql, it seems wrong to expect a string


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88373268
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29526/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89107875
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29633/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89120121
  
      [Test build #29634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29634/consoleFull) for   PR 5303 at commit [`28f3d81`](https://github.com/apache/spark/commit/28f3d817456f7d25a50b9e47b376bd4b252e1d19).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88662140
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29566/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89337191
  
    Close this one to get rid of all the jenkins comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88654420
  
      [Test build #29570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29570/consoleFull) for   PR 5303 at commit [`23a766c`](https://github.com/apache/spark/commit/23a766cf8eb9142ef641a7eb69f136a817d8fb32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88371439
  
      [Test build #29526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29526/consoleFull) for   PR 5303 at commit [`5f9e120`](https://github.com/apache/spark/commit/5f9e1207a3896cb779d17b4aff6c58961e54b827).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89081301
  
      [Test build #29632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29632/consoleFull) for   PR 5303 at commit [`ccaf78e`](https://github.com/apache/spark/commit/ccaf78e2e3b9014dbcf5395fb986656734bbf2dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88734217
  
      [Test build #29586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29586/consoleFull) for   PR 5303 at commit [`9dc32d1`](https://github.com/apache/spark/commit/9dc32d1b08cd6c76a7a096beba67bcbca89b0634).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88395481
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29527/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88770218
  
      [Test build #29586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29586/consoleFull) for   PR 5303 at commit [`9dc32d1`](https://github.com/apache/spark/commit/9dc32d1b08cd6c76a7a096beba67bcbca89b0634).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal (var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89169864
  
    @rxin I will create a new PR after fixing the rest tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88300700
  
      [Test build #29512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29512/consoleFull) for   PR 5303 at commit [`a85fb27`](https://github.com/apache/spark/commit/a85fb275d742dd9384e15f22878b545e9a77a106).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `//final class MutableString extends MutableValue `
      * `case class Literal(var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88300703
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29512/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89103221
  
    Can we open a new PR and close this one to get rid of the all the Jenkins messages?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27620676
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala ---
    @@ -62,7 +75,10 @@ object IntegerLiteral {
       }
     }
     
    -case class Literal(value: Any, dataType: DataType) extends LeafExpression {
    +case class Literal (var value: Any, dataType: DataType) extends LeafExpression {
    --- End diff --
    
    I think you are right. Can we submit a separate PR to avoid using the constructor directly? Then we can merge this one.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88300622
  
      [Test build #29512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29512/consoleFull) for   PR 5303 at commit [`a85fb27`](https://github.com/apache/spark/commit/a85fb275d742dd9384e15f22878b545e9a77a106).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89231294
  
      [Test build #29658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29658/consoleFull) for   PR 5303 at commit [`8d17f21`](https://github.com/apache/spark/commit/8d17f21b1f4bb7daa1dcf950330e53d8aebd074b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27715679
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -80,6 +82,24 @@ trait ScalaReflection {
         case (other, _) => other
       }
     
    +  /**
    +   *  Converts Scala objects to catalyst rows / types.
    +   *  Note: This should be called before do evaluation on Row
    +   *        (It does not support UDT)
    +   */
    +  def convertToCatalyst(a: Any): Any = a match {
    --- End diff --
    
    No, we have a similar one: `convertToCatalyst (a: Any, dt: DataType)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88619797
  
      [Test build #29560 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29560/consoleFull) for   PR 5303 at commit [`bb52e44`](https://github.com/apache/spark/commit/bb52e442779c3ab46fc0dc24ac5332ae17e22ad7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88851497
  
      [Test build #29598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29598/consoleFull) for   PR 5303 at commit [`956b0a4`](https://github.com/apache/spark/commit/956b0a48bfccc6550f66319c4475220eb6101b23).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88640074
  
      [Test build #29566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29566/consoleFull) for   PR 5303 at commit [`8b45864`](https://github.com/apache/spark/commit/8b458644a269d9e1542a2a4ab24d735aa95ed49d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP] [SPARK-6638] [SQL] Improve performance o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88639683
  
      [Test build #29560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29560/consoleFull) for   PR 5303 at commit [`bb52e44`](https://github.com/apache/spark/commit/bb52e442779c3ab46fc0dc24ac5332ae17e22ad7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class MutableString extends MutableValue `
      * `case class Literal(var value: Any, dataType: DataType) extends LeafExpression `
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88851513
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29598/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89054517
  
      [Test build #29625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29625/consoleFull) for   PR 5303 at commit [`537631c`](https://github.com/apache/spark/commit/537631c09c5167dc8b13785b5aa678d5c984d73a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-89206829
  
      [Test build #29658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29658/consoleFull) for   PR 5303 at commit [`8d17f21`](https://github.com/apache/spark/commit/8d17f21b1f4bb7daa1dcf950330e53d8aebd074b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5303#discussion_r27539235
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/UTF8String.scala ---
    @@ -0,0 +1,164 @@
    +/*
    +* Licensed to the Apache Software Foundation (ASF) under one or more
    +* contributor license agreements.  See the NOTICE file distributed with
    +* this work for additional information regarding copyright ownership.
    +* The ASF licenses this file to You under the Apache License, Version 2.0
    +* (the "License"); you may not use this file except in compliance with
    +* the License.  You may obtain a copy of the License at
    +*
    +*    http://www.apache.org/licenses/LICENSE-2.0
    +*
    +* Unless required by applicable law or agreed to in writing, software
    +* distributed under the License is distributed on an "AS IS" BASIS,
    +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +* See the License for the specific language governing permissions and
    +* limitations under the License.
    +*/
    +
    +package org.apache.spark.sql.types
    +
    +/**
    + *  A UTF-8 String used only in SparkSQL
    + */
    +
    +private[sql] final class UTF8String extends Ordered[UTF8String] with Serializable {
    +  private var bytes: Array[Byte] = _
    +
    +  def set(str: String): UTF8String = {
    +    bytes = str.getBytes("utf-8")
    +    this
    +  }
    +
    +  def set(bytes: Array[Byte]): UTF8String = {
    +    this.bytes = bytes.clone()
    +    this
    +  }
    +
    +  def length(): Int = {
    +    var len = 0
    +    var i: Int = 0
    +    while (i < bytes.length) {
    +      val b = bytes(i)
    +      i += 1
    +      if (b >= 196) {
    +        i += UTF8String.bytesFromUTF8(b - 196)
    +      }
    +      len += 1
    +    }
    +    len
    +  }
    +
    +  def getBytes: Array[Byte] = {
    +    bytes
    +  }
    +
    +  def slice(start: Int, end: Int): UTF8String = {
    +    if (end <= start || start >= bytes.length || bytes == null) {
    +      new UTF8String
    +    }
    +
    +    var c = 0
    +    var i: Int = 0
    +    while (c < start && i < bytes.length) {
    +      val b = bytes(i)
    +      i += 1
    +      if (b >= 196) {
    +        i += UTF8String.bytesFromUTF8(b - 196)
    +      }
    +      c += 1
    +    }
    +    val bstart = i
    +    while (c < end && i < bytes.length) {
    +      val b = bytes(i)
    +      i += 1
    +      if (b >= 196) {
    +        i += UTF8String.bytesFromUTF8(b - 196)
    +      }
    +      c += 1
    +    }
    +    UTF8String(java.util.Arrays.copyOfRange(bytes, bstart, i))
    +  }
    +
    +  def contains(sub: UTF8String): Boolean = {
    +    bytes.containsSlice(sub.bytes)
    +  }
    +
    +  def startsWith(prefix: UTF8String): Boolean = {
    +    bytes.startsWith(prefix.bytes)
    +  }
    +
    +  def endsWith(suffix: UTF8String): Boolean = {
    +    bytes.endsWith(suffix.bytes)
    +  }
    +
    +  def toUpperCase(): UTF8String = {
    +    UTF8String(toString().toUpperCase)
    +  }
    +
    +  def toLowerCase(): UTF8String = {
    +    UTF8String(toString().toLowerCase)
    +  }
    +
    +  override def toString(): String = {
    +    new String(bytes, "utf-8")
    +  }
    +
    +  override def clone(): UTF8String = new UTF8String().set(this.bytes)
    +
    +  override def compare(other: UTF8String): Int = {
    +    var i: Int = 0
    +    while (i < bytes.length && i < other.bytes.length) {
    +      val res = bytes(i).compareTo(other.bytes(i))
    +      if (res != 0) return res
    +      i += 1
    +    }
    +    bytes.length - other.bytes.length
    +  }
    +
    +  override def compareTo(other: UTF8String): Int = {
    +    compare(other)
    +  }
    +
    +  override def equals(other: Any): Boolean = other match {
    +    case s: UTF8String =>
    +      java.util.Arrays.equals(bytes, s.bytes)
    +    case s: String =>
    +      bytes.length >= s.length && length() == s.length && toString() == s
    +    case _ =>
    +      false
    +  }
    +
    +  override def hashCode(): Int = {
    +    java.util.Arrays.hashCode(bytes)
    +  }
    +}
    +
    +private[sql] object UTF8String {
    +  // number of tailing bytes in a UTF8 sequence for a code point
    +  private[types] val bytesFromUTF8: Array[Int] = Array(1, 1, 1, 1, 1,
    --- End diff --
    
    are there any links you can include for reference?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5303#issuecomment-88820325
  
      [Test build #29598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29598/consoleFull) for   PR 5303 at commit [`956b0a4`](https://github.com/apache/spark/commit/956b0a48bfccc6550f66319c4475220eb6101b23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org