You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2015/04/07 19:17:55 UTC

[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/5395

    [SPARK-6747][SQL] Support List<> as a return type in Hive UDF

    This patch supports List<> as a return type in Hive UDF.
    
    We assume an UDF below;
    public class UDFToListString extends UDF {
    public List<String> evaluate(Object o)
    { return Arrays.asList("xxx", "yyy", "zzz"); }
    }
    An exception of scala.MatchError is thrown as follows when the UDF used in the current implementation.
    scala.MatchError: interface java.util.List (of class java.lang.Class)
    at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
    at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
    at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
    at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
    at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
    at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
    at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
    at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
    ...

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark FixBugInHiveInspectors

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5395
    
----
commit bd165b9f72ef8c30508423688c38b8bacc734884
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2015-04-07T16:41:17Z

    Support List as a return type in Hive UDF

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r27938425
  
    --- Diff: sql/hive/src/test/java/org/apache/spark/sql/hive/execution/UDFToListString.java ---
    @@ -0,0 +1,29 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive.execution;
    +
    +import org.apache.hadoop.hive.ql.exec.UDF;
    +
    +import java.util.Arrays;
    +import java.util.List;
    +
    +public class UDFToListString extends UDF {
    +    public List<String> evaluate(Object o) {
    +        return Arrays.asList("data1", "data2", "data3");
    +    }
    +}
    --- End diff --
    
    Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-96462276
  
    cc @marmbrus Could you merge into master? I'll make a PR of SPARK-6912, but it depends on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-92983297
  
      [Test build #30253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30253/consoleFull) for   PR 5395 at commit [`3a8d952`](https://github.com/apache/spark/commit/3a8d952aacf7a9d31ff6db6d4c9a609ddc66654f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch **adds the following new dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.4.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.6.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90664889
  
    @maropu my concern is does Hive support the UDF which return type is `List`? Can you confirm that? Or  can you provide a Hive comparison unit test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r27905941
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
    @@ -666,3 +676,18 @@ private[hive] trait HiveInspectors {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * This represents an erased type because of type erasure in JVM.
    + */
    +@DeveloperApi
    +class ErasedType private() extends DataType {
    --- End diff --
    
    I agree with @chenghao-intel that we should confirm this is supported with hive using a compatibility test.  If so, we should probably just use `NullType` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90713388
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29807/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93877056
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30445/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93863177
  
    This is still creating a new type.  Can we use `NullType` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93032427
  
      [Test build #30265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30265/consoleFull) for   PR 5395 at commit [`8e333c7`](https://github.com/apache/spark/commit/8e333c7a05964e34cad8e9eb274aa746ab23e13a).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-96767679
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93877052
  
      [Test build #30445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30445/consoleFull) for   PR 5395 at commit [`ee56a0a`](https://github.com/apache/spark/commit/ee56a0a10ebb784debea3962419a61f85d9bd870).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93032455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30265/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r28346669
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
    @@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
     
         case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
     
    +    // list type
    +    case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
    +      logWarning("Failed to catch a correct component type in List<> because of type erasure," +
    +        " so you need to handle it correctly by yourself")
    --- End diff --
    
    I'm not sure the best one as this warning message.
    Is " so you need to cast it into the correct type by yourself" ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r27901030
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
    @@ -666,3 +676,18 @@ private[hive] trait HiveInspectors {
         }
       }
     }
    +
    +/**
    + * :: DeveloperApi ::
    + * This represents an erased type because of type erasure in JVM.
    + */
    +@DeveloperApi
    +class ErasedType private() extends DataType {
    --- End diff --
    
    Does the type `List of Object` supported by Hive? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90795154
  
      [Test build #29825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29825/consoleFull) for   PR 5395 at commit [`02b3a91`](https://github.com/apache/spark/commit/02b3a9105b9403517a2a842ce6591e845f135bce).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-91777942
  
    ISTM hive supports list<> as a return type (see the links below).
    Also, some thrid-party libraries use it.
    
    https://github.com/kyluka/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java#L163
    https://github.com/l1x/apache-hive/blob/master/hive-0.8.1/src/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java#L113


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93870531
  
    Missed and fixed. This fix satisfies your point?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93870726
  
      [Test build #30445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30445/consoleFull) for   PR 5395 at commit [`ee56a0a`](https://github.com/apache/spark/commit/ee56a0a10ebb784debea3962419a61f85d9bd870).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90689379
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r28198699
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
    @@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
     
         case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
     
    +    // list type
    +    case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
    +      logWarning("Failed to catch a correct component type in List<> because of type erasure," +
    +        " so you need to handle it correctly by yourself")
    --- End diff --
    
    Maybe specify how they would handle it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-92977037
  
      [Test build #30253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30253/consoleFull) for   PR 5395 at commit [`3a8d952`](https://github.com/apache/spark/commit/3a8d952aacf7a9d31ff6db6d4c9a609ddc66654f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93003697
  
      [Test build #30265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30265/consoleFull) for   PR 5395 at commit [`8e333c7`](https://github.com/apache/spark/commit/8e333c7a05964e34cad8e9eb274aa746ab23e13a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r27900712
  
    --- Diff: sql/hive/src/test/java/org/apache/spark/sql/hive/execution/UDFToListString.java ---
    @@ -0,0 +1,29 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive.execution;
    +
    +import org.apache.hadoop.hive.ql.exec.UDF;
    +
    +import java.util.Arrays;
    +import java.util.List;
    +
    +public class UDFToListString extends UDF {
    +    public List<String> evaluate(Object o) {
    +        return Arrays.asList("data1", "data2", "data3");
    +    }
    +}
    --- End diff --
    
    Add a blank line at the end of file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90783406
  
      [Test build #29825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29825/consoleFull) for   PR 5395 at commit [`02b3a91`](https://github.com/apache/spark/commit/02b3a9105b9403517a2a842ce6591e845f135bce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-92983310
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30253/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90784532
  
    Ok, I will look into the implementation and the documentation of Hive for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90653250
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90713372
  
      [Test build #29807 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29807/consoleFull) for   PR 5395 at commit [`bd165b9`](https://github.com/apache/spark/commit/bd165b9f72ef8c30508423688c38b8bacc734884).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90795164
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29825/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r28198691
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
    @@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
     
         case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
     
    +    // list type
    +    case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
    +      logWarning("Failed to catch a correct component type in List<> because of type erasure," +
    +        " so you need to handle it correctly by yourself")
    +      ArrayType(ErasedType)
    +
         // Hive seems to return this for struct types?
         case c: Class[_] if c == classOf[java.lang.Object] => NullType
    +
    +    case c => throw new HiveDataTypeException("Unknown java type: " + c)
    --- End diff --
    
    This should just be an `AnalysisException`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-90690245
  
      [Test build #29807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29807/consoleFull) for   PR 5395 at commit [`bd165b9`](https://github.com/apache/spark/commit/bd165b9f72ef8c30508423688c38b8bacc734884).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r28198701
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
    @@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
     
         case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
     
    +    // list type
    +    case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
    +      logWarning("Failed to catch a correct component type in List<> because of type erasure," +
    +        " so you need to handle it correctly by yourself")
    +      ArrayType(ErasedType)
    +
         // Hive seems to return this for struct types?
         case c: Class[_] if c == classOf[java.lang.Object] => NullType
    +
    +    case c => throw new HiveDataTypeException("Unknown java type: " + c)
    --- End diff --
    
    Also prefer string interpolation to `+`, `s"Unknown UDF input type $c"`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5395#discussion_r28346473
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
    @@ -213,8 +215,16 @@ private[hive] trait HiveInspectors {
     
         case c: Class[_] if c.isArray => ArrayType(javaClassToDataType(c.getComponentType))
     
    +    // list type
    +    case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
    +      logWarning("Failed to catch a correct component type in List<> because of type erasure," +
    +        " so you need to handle it correctly by yourself")
    +      ArrayType(ErasedType)
    +
         // Hive seems to return this for struct types?
         case c: Class[_] if c == classOf[java.lang.Object] => NullType
    +
    +    case c => throw new HiveDataTypeException("Unknown java type: " + c)
    --- End diff --
    
    s"Unsupported java type $c" seems to be better in this error message because this method is not only designed for UDF.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-91932878
  
    Thanks for researching this.  Can you address the final comments about avoiding the creation of a new type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-92981451
  
    Sorry for the delay. Fixed and plz re-check them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6747][SQL] Support List<> as a return t...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5395#issuecomment-93871894
  
    Yes, LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org