You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2015/02/23 18:20:10 UTC

[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/4729

    [SPARK-5950][SQL] Enable inserting array into Hive table saved as Parquet using DataSource API

    Currently `ParquetConversions` in `HiveMetastoreCatalog` does not really work. One reason is that table is not part of the children nodes of `InsertIntoTable`. So the replacing is not working.
    
    When we create a Parquet table in Hive with ARRAY field. In default `ArrayType` has `containsNull` as true. It affects the table's schema. But when inserting data into the table later, the schema of inserting data can be  with `containsNull` as true or false. That makes the inserting/reading failed.
    
    A similar problem is reported in https://issues.apache.org/jira/browse/SPARK-5508.
    
    Hive seems only support null elements array. So this pr enables same behavior.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 hive_parquet

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4729
    
----
commit 4e3bd5568e644bc81e2539a917329486ea968a92
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2015-02-23T17:03:30Z

    Enable inserting array into hive table saved as parquet using datasource.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4729#discussion_r25362155
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -458,6 +458,9 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
     
               withAlias
             }
    +        case InsertIntoHiveTable(r: MetastoreRelation, p, c, o) if relationMap.contains(r) =>
    +          val parquetRelation = relationMap(r)
    +          InsertIntoHiveTable(parquetRelation, p, c, o) 
    --- End diff --
    
    Same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76302646
  
    @yhuai I see. That issue was first fixed by this pr. You can see the commits before. Even the destination table in replaced, the issue of array (or map) is still there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76296385
  
    @viirya I think the issue at here is that the data written by hive parquet serde may not be read back by our own data source parquet. I have changed the title of the jira. It will be great if you can change your PR title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4729#discussion_r25480454
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -299,6 +301,37 @@ class ParquetDataSourceOnSourceSuite extends ParquetSourceSuiteBase {
         super.afterAll()
         setConf(SQLConf.PARQUET_USE_DATA_SOURCE_API, originalConf.toString)
       }
    +
    +  test("insert array into parquet hive table using data source api") {
    --- End diff --
    
    I just tried this test with our master, it did not fail. I think you need to first turn off the conversion for the write path and then turn on the conversion for the read path. You can use `spark.sql.parquet.useDataSourceApi` to control it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76512714
  
    @viirya Thank you for working on it! Our discussions helped me clearly understand the problem. After discussions with @liancheng, I am proposing a different approach to address this issue in https://github.com/apache/spark/pull/4826. Please feel free to leave comments at there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4729#discussion_r25395555
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -424,7 +424,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
           // Collects all `MetastoreRelation`s which should be replaced
           val toBeReplaced = plan.collect {
             // Write path
    -        case InsertIntoTable(relation: MetastoreRelation, _, _, _)
    +        case InsertIntoHiveTable(relation: MetastoreRelation, _, _, _)
    --- End diff --
    
    `InsertIntoHiveTable` is a `LogicalPlan` defined in `HiveMetastoreCatalog`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76301583
  
    When the jira was created, we did not correctly replace the destination table in insert into to our data source table. We were actually calling InsertIntoHive to do the work. f02394d06473889d0d7897c4583239e6e136ff46 fixed this problem. Now, you need to turn off our metastore conversion to see the problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76303227
  
    Can you try your unit test (without any other change) with master? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/4729


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75758412
  
    cc @marmbrus.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76295694
  
    @liancheng Unlike the issue of `ParquetConversions`, I think the array insertion issue may not be just a Hive specific one. The problem is when we create Parquet table that includes array (or map, struct), by default we use a schema that sets `containsNull` as true. But actually later we want to insert data, the data schema could have `containsNull` as true or false. In Hive, seems that it solves this problem by only supporting these fields containing null elements. So no matter the inserting data contains null or not, we set its schema to have `containsNull` as true before inserting into Parquet file. Since I think we don't want to explicitly change the data schema and affect other parts, doing it in `RowWriteSupport` should be ok, except you have other concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76315917
  
    I can't tell if SPARK-5508 is using InsertIntoHive or not. I didn't see if `spark.sql.parquet.useDataSourceApi` is turning on or off in that JIRA.
    
    If you simple replace `InsertIntoTable`'s relation to `ParquetConversions`, then you will get `org.apache.spark.sql.AnalysisException`. So I don't know why you said the test is passed.
    
    For SPARK-5950, there are few issues:
    
    1 It the problem of `ParquetConversions`. As you did in #4782, `InsertIntoTable`'s table is never replaced. 
    2 `AnalysisException`. That is why I use `InsertIntoHiveTable` to replace `InsertIntoTable` in `ParquetConversions`. Because `InsertIntoHiveTable` doesn't check the equality of `containsNull`.
    3 Since the `containsNull` of `ArrayType`, `MapType`, `StructType` is set to true by default, the schema of created Parquet table always has `containsNull` as true. Later, when you try to insert data that has same schema but only with different `containsNull` value, Parquet library will complain that the schema is different. So the reading will fail.
    
    This pr has solved all the three problems (I will update for `MapType`, `StructType`). #4782 just considers the first one.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4729#discussion_r25400611
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -424,7 +424,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
           // Collects all `MetastoreRelation`s which should be replaced
           val toBeReplaced = plan.collect {
             // Write path
    -        case InsertIntoTable(relation: MetastoreRelation, _, _, _)
    +        case InsertIntoHiveTable(relation: MetastoreRelation, _, _, _)
    --- End diff --
    
    Oh sorry, I mistook this for the physical plan with the same name...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76310964
  
    @yhuai Yes, I know that. I know there are two bugs. And I reported them in this pr and fixed them in the commits. You should read the description of this pr and my commits first.
    
    You just solved part of the first issue. As I said, the unit test I added is still failed on the master now. That is because your commit is just part of my commits in this pr. Because of that, I don't know why you want to open another pr, instead of just using my commits.
    
    I have said, the second issue is not caused by "Hive's parquet serde may not be able to read by data source parquet table". Because I create the parquet table using data source api not Hive parquet serde.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4729#discussion_r25362135
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -424,7 +424,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
           // Collects all `MetastoreRelation`s which should be replaced
           val toBeReplaced = plan.collect {
             // Write path
    -        case InsertIntoTable(relation: MetastoreRelation, _, _, _)
    +        case InsertIntoHiveTable(relation: MetastoreRelation, _, _, _)
    --- End diff --
    
    I don't think this is right here. `ParquetConversions` is an analysis rule, which only processes logical plans. However, `InsertIntoHiveTable` is a physical plan node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76301308
  
    @yhuai That problem is not caused by hive parquet serde. You can see the unit test I added. The table is created using data source api.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76256448
  
    Hey @viirya, this PR actually fixes two issues, the `ParquetConvertions` one and the array insertion one. However, both fixes need some tweaks. As 1.3 release is really close, @yhuai opened #4782 based on your work to fix the first issue. As for the array insertion issue, I feel hesitant to add the fix in `RowWriteSupport`, since this should be a Hive specific issue. Also, map and struct should also suffer the same issue, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76309465
  
    Maybe I did not explain it clearly. SPARK-6023 and SPARK-5950 are two bugs, the first one is that we failed to replace the destination MetastoreRelation in InsertIntoTable even we ask Spark SQL to convert all MetastoreRelations associated with parquet tables to our data source parquet tables. The root cause for this one was clear and the fix is pretty simple. The second bug is arrays (maybe maps and structs?) written by Hive's parquet serde may not be able to read by data source parquet table. SPARK-5950 is for this bug. Since this pr is not ready (I will leave comments later), I made #4782 and we checked in it first to fix SPARK-6023.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75852468
  
    /cc @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76310259
  
    Seems Hive's parquet serde always values are nullable. Can you double check it? Also, we need to check if `StructType` and `MapType` are affected by this bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75686837
  
      [Test build #27870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27870/consoleFull) for   PR 4729 at commit [`0e07bb8`](https://github.com/apache/spark/commit/0e07bb879d4d804b3c3f7823f8f7d19fdd71d83f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76306767
  
    @liancheng @yhuai Actually I don't know why you opened #4782 in order to fix the first issue. Because as I see, the commits of #4782 is just part of my commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76420679
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28070/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4729#discussion_r25480833
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -299,6 +301,37 @@ class ParquetDataSourceOnSourceSuite extends ParquetSourceSuiteBase {
         super.afterAll()
         setConf(SQLConf.PARQUET_USE_DATA_SOURCE_API, originalConf.toString)
       }
    +
    +  test("insert array into parquet hive table using data source api") {
    --- End diff --
    
    `spark.sql.parquet.useDataSourceApi` is turn on already in the unit test I added. It failed on the master I just pulled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75676366
  
      [Test build #27870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27870/consoleFull) for   PR 4729 at commit [`0e07bb8`](https://github.com/apache/spark/commit/0e07bb879d4d804b3c3f7823f8f7d19fdd71d83f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76304040
  
    In fact, even #4782 doesn't solve the issue I reported in this pr. The unit test is failed before hitting the data insertion issue...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75709019
  
      [Test build #27883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27883/consoleFull) for   PR 4729 at commit [`175966f`](https://github.com/apache/spark/commit/175966f4e275beaf21363db196102dcb1a4b1d3e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75597588
  
      [Test build #27853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27853/consoleFull) for   PR 4729 at commit [`4e3bd55`](https://github.com/apache/spark/commit/4e3bd5568e644bc81e2539a917329486ea968a92).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76420655
  
      [Test build #28070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28070/consoleFull) for   PR 4729 at commit [`2949324`](https://github.com/apache/spark/commit/2949324222c0e37d6291f9a6b95a383676408ce9).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `                logError("User class threw exception: " + cause.getMessage, cause)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75588235
  
      [Test build #27853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27853/consoleFull) for   PR 4729 at commit [`4e3bd55`](https://github.com/apache/spark/commit/4e3bd5568e644bc81e2539a917329486ea968a92).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75718250
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27883/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4729#discussion_r25480542
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
    @@ -254,4 +254,13 @@ private[hive] trait HiveStrategies {
           case _ => Nil
         }
       }
    +
    +  object HiveDataSourceStrategy extends Strategy {
    --- End diff --
    
    Seems we do not need it. When we want to insert into a data source table, `logical.InsertIntoTable` will be used instead of `logical.InsertIntoHiveTable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76313868
  
    OK. Now I understand what's going on. For SPARK-5950, we cannot do insert because `InsertIntoTable` will not be resolved and you saw an org.apache.spark.sql.AnalysisException, right? For SPARK-5508, the problem is data is inserted through InsertIntoHive and we cannot read it from our data source API write path. Are you trying to resolve both in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76519493
  
    I don't see any difference. `DataType.alwaysNullable` just does the same think as in this pr...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75718241
  
      [Test build #27883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27883/consoleFull) for   PR 4729 at commit [`175966f`](https://github.com/apache/spark/commit/175966f4e275beaf21363db196102dcb1a4b1d3e).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76403689
  
      [Test build #28070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28070/consoleFull) for   PR 4729 at commit [`2949324`](https://github.com/apache/spark/commit/2949324222c0e37d6291f9a6b95a383676408ce9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75597599
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27853/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-75686842
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27870/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org