You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2015/04/07 19:17:55 UTC
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/5395
[SPARK-6747][SQL] Support List<> as a return type in Hive UDF
This patch supports List<> as a return type in Hive UDF.
We assume an UDF below;
public class UDFToListString extends UDF {
public List<String> evaluate(Object o)
{ return Arrays.asList("xxx", "yyy", "zzz"); }
}
An exception of scala.MatchError is thrown as follows when the UDF used in the current implementation.
scala.MatchError: interface java.util.List (of class java.lang.Class)
at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
...
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maropu/spark FixBugInHiveInspectors
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5395.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5395
----
commit bd165b9f72ef8c30508423688c38b8bacc734884
Author: Takeshi YAMAMURO <li...@gmail.com>
Date: 2015-04-07T16:41:17Z
Support List as a return type in Hive UDF
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r27938425
--- Diff: sql/hive/src/test/java/org/apache/spark/sql/hive/execution/UDFToListString.java ---
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution;
+
+import org.apache.hadoop.hive.ql.exec.UDF;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class UDFToListString extends UDF {
+ public List<String> evaluate(Object o) {
+ return Arrays.asList("data1", "data2", "data3");
+ }
+}
--- End diff --
Fixed
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-96462276
cc @marmbrus Could you merge into master? I'll make a PR of SPARK-6912, but it depends on this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-92983297
[Test build #30253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30253/consoleFull) for PR 5395 at commit [`3a8d952`](https://github.com/apache/spark/commit/3a8d952aacf7a9d31ff6db6d4c9a609ddc66654f).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
* This patch **adds the following new dependencies:**
* `RoaringBitmap-0.4.5.jar`
* `activation-1.1.jar`
* `akka-actor_2.10-2.3.4-spark.jar`
* `akka-remote_2.10-2.3.4-spark.jar`
* `akka-slf4j_2.10-2.3.4-spark.jar`
* `aopalliance-1.0.jar`
* `arpack_combined_all-0.1.jar`
* `avro-1.7.7.jar`
* `breeze-macros_2.10-0.11.2.jar`
* `breeze_2.10-0.11.2.jar`
* `chill-java-0.5.0.jar`
* `chill_2.10-0.5.0.jar`
* `commons-beanutils-1.7.0.jar`
* `commons-beanutils-core-1.8.0.jar`
* `commons-cli-1.2.jar`
* `commons-codec-1.10.jar`
* `commons-collections-3.2.1.jar`
* `commons-compress-1.4.1.jar`
* `commons-configuration-1.6.jar`
* `commons-digester-1.8.jar`
* `commons-httpclient-3.1.jar`
* `commons-io-2.1.jar`
* `commons-lang-2.5.jar`
* `commons-lang3-3.3.2.jar`
* `commons-math-2.1.jar`
* `commons-math3-3.4.1.jar`
* `commons-net-2.2.jar`
* `compress-lzf-1.0.0.jar`
* `config-1.2.1.jar`
* `core-1.1.2.jar`
* `curator-client-2.4.0.jar`
* `curator-framework-2.4.0.jar`
* `curator-recipes-2.4.0.jar`
* `gmbal-api-only-3.0.0-b023.jar`
* `grizzly-framework-2.1.2.jar`
* `grizzly-http-2.1.2.jar`
* `grizzly-http-server-2.1.2.jar`
* `grizzly-http-servlet-2.1.2.jar`
* `grizzly-rcm-2.1.2.jar`
* `groovy-all-2.3.7.jar`
* `guava-14.0.1.jar`
* `guice-3.0.jar`
* `hadoop-annotations-2.2.0.jar`
* `hadoop-auth-2.2.0.jar`
* `hadoop-client-2.2.0.jar`
* `hadoop-common-2.2.0.jar`
* `hadoop-hdfs-2.2.0.jar`
* `hadoop-mapreduce-client-app-2.2.0.jar`
* `hadoop-mapreduce-client-common-2.2.0.jar`
* `hadoop-mapreduce-client-core-2.2.0.jar`
* `hadoop-mapreduce-client-jobclient-2.2.0.jar`
* `hadoop-mapreduce-client-shuffle-2.2.0.jar`
* `hadoop-yarn-api-2.2.0.jar`
* `hadoop-yarn-client-2.2.0.jar`
* `hadoop-yarn-common-2.2.0.jar`
* `hadoop-yarn-server-common-2.2.0.jar`
* `ivy-2.4.0.jar`
* `jackson-annotations-2.4.0.jar`
* `jackson-core-2.4.4.jar`
* `jackson-core-asl-1.8.8.jar`
* `jackson-databind-2.4.4.jar`
* `jackson-jaxrs-1.8.8.jar`
* `jackson-mapper-asl-1.8.8.jar`
* `jackson-module-scala_2.10-2.4.4.jar`
* `jackson-xc-1.8.8.jar`
* `jansi-1.4.jar`
* `javax.inject-1.jar`
* `javax.servlet-3.0.0.v201112011016.jar`
* `javax.servlet-3.1.jar`
* `javax.servlet-api-3.0.1.jar`
* `jaxb-api-2.2.2.jar`
* `jaxb-impl-2.2.3-1.jar`
* `jcl-over-slf4j-1.7.10.jar`
* `jersey-client-1.9.jar`
* `jersey-core-1.9.jar`
* `jersey-grizzly2-1.9.jar`
* `jersey-guice-1.9.jar`
* `jersey-json-1.9.jar`
* `jersey-server-1.9.jar`
* `jersey-test-framework-core-1.9.jar`
* `jersey-test-framework-grizzly2-1.9.jar`
* `jets3t-0.7.1.jar`
* `jettison-1.1.jar`
* `jetty-util-6.1.26.jar`
* `jline-0.9.94.jar`
* `jline-2.10.4.jar`
* `jodd-core-3.6.3.jar`
* `json4s-ast_2.10-3.2.10.jar`
* `json4s-core_2.10-3.2.10.jar`
* `json4s-jackson_2.10-3.2.10.jar`
* `jsr305-1.3.9.jar`
* `jtransforms-2.4.0.jar`
* `jul-to-slf4j-1.7.10.jar`
* `kryo-2.21.jar`
* `log4j-1.2.17.jar`
* `lz4-1.2.0.jar`
* `management-api-3.0.0-b012.jar`
* `mesos-0.21.0-shaded-protobuf.jar`
* `metrics-core-3.1.0.jar`
* `metrics-graphite-3.1.0.jar`
* `metrics-json-3.1.0.jar`
* `metrics-jvm-3.1.0.jar`
* `minlog-1.2.jar`
* `netty-3.8.0.Final.jar`
* `netty-all-4.0.23.Final.jar`
* `objenesis-1.2.jar`
* `opencsv-2.3.jar`
* `oro-2.0.8.jar`
* `paranamer-2.6.jar`
* `parquet-column-1.6.0rc3.jar`
* `parquet-common-1.6.0rc3.jar`
* `parquet-encoding-1.6.0rc3.jar`
* `parquet-format-2.2.0-rc1.jar`
* `parquet-generator-1.6.0rc3.jar`
* `parquet-hadoop-1.6.0rc3.jar`
* `parquet-jackson-1.6.0rc3.jar`
* `protobuf-java-2.4.1.jar`
* `protobuf-java-2.5.0-spark.jar`
* `py4j-0.8.2.1.jar`
* `pyrolite-2.0.1.jar`
* `quasiquotes_2.10-2.0.1.jar`
* `reflectasm-1.07-shaded.jar`
* `scala-compiler-2.10.4.jar`
* `scala-library-2.10.4.jar`
* `scala-reflect-2.10.4.jar`
* `scalap-2.10.4.jar`
* `scalatest_2.10-2.2.1.jar`
* `slf4j-api-1.7.10.jar`
* `slf4j-log4j12-1.7.10.jar`
* `snappy-java-1.1.1.6.jar`
* `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
* `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
* `spark-core_2.10-1.4.0-SNAPSHOT.jar`
* `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
* `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
* `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
* `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
* `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
* `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
* `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
* `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
* `spire-macros_2.10-0.7.4.jar`
* `spire_2.10-0.7.4.jar`
* `stax-api-1.0.1.jar`
* `stream-2.7.0.jar`
* `tachyon-0.5.0.jar`
* `tachyon-client-0.5.0.jar`
* `uncommons-maths-1.2.2a.jar`
* `unused-1.0.0.jar`
* `xmlenc-0.52.jar`
* `xz-1.0.jar`
* `zookeeper-3.4.5.jar`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90664889
@maropu my concern is does Hive support the UDF which return type is `List`? Can you confirm that? Or can you provide a Hive comparison unit test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r27905941
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -666,3 +676,18 @@ private[hive] trait HiveInspectors {
}
}
}
+
+/**
+ * :: DeveloperApi ::
+ * This represents an erased type because of type erasure in JVM.
+ */
+@DeveloperApi
+class ErasedType private() extends DataType {
--- End diff --
I agree with @chenghao-intel that we should confirm this is supported with hive using a compatibility test. If so, we should probably just use `NullType` here
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90713388
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29807/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93877056
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30445/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93863177
This is still creating a new type. Can we use `NullType` instead?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93032427
[Test build #30265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30265/consoleFull) for PR 5395 at commit [`8e333c7`](https://github.com/apache/spark/commit/8e333c7a05964e34cad8e9eb274aa746ab23e13a).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
* This patch does not change any dependencies.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-96767679
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93877052
[Test build #30445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30445/consoleFull) for PR 5395 at commit [`ee56a0a`](https://github.com/apache/spark/commit/ee56a0a10ebb784debea3962419a61f85d9bd870).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
* This patch does not change any dependencies.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93032455
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30265/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r28346669
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
+ // list type
+ case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
+ logWarning("Failed to catch a correct component type in List<> because of type erasure," +
+ " so you need to handle it correctly by yourself")
--- End diff --
I'm not sure the best one as this warning message.
Is " so you need to cast it into the correct type by yourself" ok?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r27901030
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -666,3 +676,18 @@ private[hive] trait HiveInspectors {
}
}
}
+
+/**
+ * :: DeveloperApi ::
+ * This represents an erased type because of type erasure in JVM.
+ */
+@DeveloperApi
+class ErasedType private() extends DataType {
--- End diff --
Does the type `List of Object` supported by Hive?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90795154
[Test build #29825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29825/consoleFull) for PR 5395 at commit [`02b3a91`](https://github.com/apache/spark/commit/02b3a9105b9403517a2a842ce6591e845f135bce).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
* This patch does not change any dependencies.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-91777942
ISTM hive supports list<> as a return type (see the links below).
Also, some thrid-party libraries use it.
https://github.com/kyluka/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java#L163
https://github.com/l1x/apache-hive/blob/master/hive-0.8.1/src/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java#L113
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93870531
Missed and fixed. This fix satisfies your point?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93870726
[Test build #30445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30445/consoleFull) for PR 5395 at commit [`ee56a0a`](https://github.com/apache/spark/commit/ee56a0a10ebb784debea3962419a61f85d9bd870).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90689379
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r28198699
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
+ // list type
+ case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
+ logWarning("Failed to catch a correct component type in List<> because of type erasure," +
+ " so you need to handle it correctly by yourself")
--- End diff --
Maybe specify how they would handle it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-92977037
[Test build #30253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30253/consoleFull) for PR 5395 at commit [`3a8d952`](https://github.com/apache/spark/commit/3a8d952aacf7a9d31ff6db6d4c9a609ddc66654f).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93003697
[Test build #30265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30265/consoleFull) for PR 5395 at commit [`8e333c7`](https://github.com/apache/spark/commit/8e333c7a05964e34cad8e9eb274aa746ab23e13a).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r27900712
--- Diff: sql/hive/src/test/java/org/apache/spark/sql/hive/execution/UDFToListString.java ---
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution;
+
+import org.apache.hadoop.hive.ql.exec.UDF;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class UDFToListString extends UDF {
+ public List<String> evaluate(Object o) {
+ return Arrays.asList("data1", "data2", "data3");
+ }
+}
--- End diff --
Add a blank line at the end of file.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90783406
[Test build #29825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29825/consoleFull) for PR 5395 at commit [`02b3a91`](https://github.com/apache/spark/commit/02b3a9105b9403517a2a842ce6591e845f135bce).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-92983310
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30253/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90784532
Ok, I will look into the implementation and the documentation of Hive for that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90653250
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90713372
[Test build #29807 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29807/consoleFull) for PR 5395 at commit [`bd165b9`](https://github.com/apache/spark/commit/bd165b9f72ef8c30508423688c38b8bacc734884).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
* This patch does not change any dependencies.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90795164
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29825/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r28198691
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
+ // list type
+ case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
+ logWarning("Failed to catch a correct component type in List<> because of type erasure," +
+ " so you need to handle it correctly by yourself")
+ ArrayType(ErasedType)
+
// Hive seems to return this for struct types?
case c: Class[_] if c == classOf[java.lang.Object] => NullType
+
+ case c => throw new HiveDataTypeException("Unknown java type: " + c)
--- End diff --
This should just be an `AnalysisException`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-90690245
[Test build #29807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29807/consoleFull) for PR 5395 at commit [`bd165b9`](https://github.com/apache/spark/commit/bd165b9f72ef8c30508423688c38b8bacc734884).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r28198701
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
+ // list type
+ case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
+ logWarning("Failed to catch a correct component type in List<> because of type erasure," +
+ " so you need to handle it correctly by yourself")
+ ArrayType(ErasedType)
+
// Hive seems to return this for struct types?
case c: Class[_] if c == classOf[java.lang.Object] => NullType
+
+ case c => throw new HiveDataTypeException("Unknown java type: " + c)
--- End diff --
Also prefer string interpolation to `+`, `s"Unknown UDF input type $c"`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/5395#discussion_r28346473
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
+ // list type
+ case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
+ logWarning("Failed to catch a correct component type in List<> because of type erasure," +
+ " so you need to handle it correctly by yourself")
+ ArrayType(ErasedType)
+
// Hive seems to return this for struct types?
case c: Class[_] if c == classOf[java.lang.Object] => NullType
+
+ case c => throw new HiveDataTypeException("Unknown java type: " + c)
--- End diff --
s"Unsupported java type $c" seems to be better in this error message because this method is not only designed for UDF.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-91932878
Thanks for researching this. Can you address the final comments about avoiding the creation of a new type?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-92981451
Sorry for the delay. Fixed and plz re-check them.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/5395#issuecomment-93871894
Yes, LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org