You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by anselmevignon <gi...@git.apache.org> on 2015/02/19 19:46:55 UTC

[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

GitHub user anselmevignon opened a pull request:

    https://github.com/apache/spark/pull/4697

    [SPARK-5775] BugFix: GenericRow cannot be cast to SpecificMutableRow when nested data and partitioned table

    The Bug solved here was due to a change in PartitionTableScan, when reading a partitioned table. 
    
    - When the Partititon column is requested out of a parquet table, the Table Scan needs to add the column back to the output Rows. 
    - To update the Row object created by PartitionTableScan, the Row was first casted in SpecificMutableRow, before being updated.
    - This casting was unsafe, since there are no guarantee that the newHadoopRDD used internally will instanciate the output Rows as MutableRow. 
    
    Particularly, when reading a Table with complex (e.g. struct or Array) types,  the newHadoopRDD  uses a parquet.io.api.RecordMateralizer, that is produced by the org.apache.spark.sql.parquet.RowReadSupport . This consumer will be created as a org.apache.spark.sql.parquet.CatalystGroupConverter (a) and not a org.apache.spark.sql.parquet.CatalystPrimitiveRowConverter (b), when there are complex types involved (in the org.apache.spark.sql.parquet.CatalystConverter.createRootConverter factory  )      
    
    The consumer (a) will output GenericRow, while the consumer (b) produces SpecificMutableRow. 
    
    Therefore any request selecting a partition columns, plus a complex type column, are returned as GenericRows, and fails into an unsafe casting pit (see https://issues.apache.org/jira/browse/SPARK-5775 for an example. ) 
    
    The bugfix proposed here replace the unsafe class casting by a case matching on the Row type, updates the Row if it is of a mutable type, and recreate a Row if it is not.
    
    This fix is unit-tested in  sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/flaminem/spark local_dev

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4697
    
----
commit 4eb04e971244d2c49085e17ae4685a31e6808066
Author: Anselme Vignon <an...@flaminem.com>
Date:   2015-02-18T09:51:52Z

    bugfix SPARK-5775

commit dbceaa308921f298b3cd9cc98fae66e1271c7f1c
Author: Anselme Vignon <an...@flaminem.com>
Date:   2015-02-18T11:17:38Z

    cutting lines

commit f876dea96d50f9df0c4d9992e82b00d3a4a7968f
Author: Anselme Vignon <an...@flaminem.com>
Date:   2015-02-18T11:17:55Z

    starting to write tests

commit ae48f7c98410d320b128ed23fb5c6cdbcb8b504c
Author: Anselme Vignon <an...@flaminem.com>
Date:   2015-02-19T18:08:48Z

    unittesting SPARK-5775

commit 22cec5206091580e9922f997ef8052ded393d225
Author: Anselme Vignon <an...@flaminem.com>
Date:   2015-02-19T18:18:02Z

    lint compatible changes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84936098
  
      [Test build #28985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28985/consoleFull) for   PR 4697 at commit [`6a4c53d`](https://github.com/apache/spark/commit/6a4c53d9491d182cc90c3160c7418b58f3b3062a).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75151319
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27734/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457449
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -263,6 +356,22 @@ abstract class ParquetTest extends QueryTest with BeforeAndAfterAll {
         }
       }
     
    +  Seq("partitioned_parquet_with_key_and_complextypes", "partitioned_parquet_with_complextypes").foreach { table =>
    +    test(s"SPARK-5775 read struct from $table") {
    +      checkAnswer(
    +        sql(s"SELECT p,  structField.intStructField , structField.stringStructField FROM $table WHERE p = 1"),
    +        (1 to 10).map { i => ((1, i, f"${i}_string"))}
    +      )
    +    }
    +
    +    test (s"SPARK-5775 read array from $table") {
    +              checkAnswer(
    +                sql(s"SELECT arrayField, p FROM $table WHERE p = 1"),
    +                (1 to 10).map { i => ((1 to i,1))}
    --- End diff --
    
    Please remove redundant parenthesis, and add a space before `}`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84936129
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28985/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84969583
  
      [Test build #28994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28994/consoleFull) for   PR 4697 at commit [`6a4c53d`](https://github.com/apache/spark/commit/6a4c53d9491d182cc90c3160c7418b58f3b3062a).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by anselmevignon <gi...@git.apache.org>.
Github user anselmevignon commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75259832
  
    Hi Ayioub,
    
    When did you pulled ? unfortunately I pulled a stupid version on my first PR... basically there was an infionite loop in the "fix"
    
    I updated the PR with the correct fix, but only 4hrs ago (commit 8fc6a8ccdf0f232ecfdf1916111b538e1fb6bfab). Would you mind checking which version you are using ?
    
    I actually had a similar, but not the same issue, so I unittested on my own problem. Please tell me if the final fix did not solved yours. there are indeed no test on array of struct, at least not in the hive unittest deck (only on catalyst)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457393
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -263,6 +356,22 @@ abstract class ParquetTest extends QueryTest with BeforeAndAfterAll {
         }
       }
     
    +  Seq("partitioned_parquet_with_key_and_complextypes", "partitioned_parquet_with_complextypes").foreach { table =>
    --- End diff --
    
    For indentation and whitespace related styling, please refer to equivalent changes in #4792.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457259
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -70,6 +72,38 @@ class ParquetMetastoreSuite extends ParquetTest {
         """)
     
         sql(s"""
    +      create external table partitioned_parquet_with_complextypes
    +      (
    +        intField INT,
    +        stringField STRING,
    +        structField STRUCT<intStructField :INT, stringStructField :STRING>,
    --- End diff --
    
    Please move the space after `:`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75128074
  
    /cc @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by anirudhcelebal <gi...@git.apache.org>.
Github user anirudhcelebal commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-192877517
  
    root 
     |-- adultbasefare: long (nullable = true) 
     |-- adultcommission: long (nullable = true) 
     |-- adultservicetax: long (nullable = true) 
     |-- adultsurcharge: long (nullable = true) 
     |-- airline: string (nullable = true) 
     |-- arrdate: string (nullable = true) 
     |-- arrtime: string (nullable = true) 
     |-- cafecommission: long (nullable = true) 
     |-- carrierid: string (nullable = true) 
     |-- class: string (nullable = true) 
     |-- depdate: string (nullable = true) 
     |-- deptime: string (nullable = true) 
     |-- destination: string (nullable = true) 
     |-- discount: long (nullable = true) 
     |-- duration: string (nullable = true) 
     |-- fare: struct (nullable = true) 
     |    |-- A: long (nullable = true) 
     |    |-- C: long (nullable = true) 
     |    |-- I: long (nullable = true) 
     |    |-- adultairlinetxncharge: long (nullable = true) 
     |    |-- adultairporttax: long (nullable = true) 
     |    |-- adultbasefare: long (nullable = true) 
     |    |-- adultcommission: double (nullable = true) 
     |    |-- adultsurcharge: long (nullable = true) 
     |    |-- adulttotalfare: long (nullable = true) 
     |    |-- childairlinetxncharge: long (nullable = true) 
     |    |-- childairporttax: long (nullable = true) 
     |    |-- childbasefare: long (nullable = true) 
     |    |-- childcommission: double (nullable = true) 
     |    |-- childsurcharge: long (nullable = true) 
     |    |-- childtotalfare: long (nullable = true) 
     |    |-- discount: long (nullable = true) 
     |    |-- infantairlinetxncharge: long (nullable = true) 
     |    |-- infantairporttax: long (nullable = true) 
     |    |-- infantbasefare: long (nullable = true) 
     |    |-- infantcommission: long (nullable = true) 
     |    |-- infantsurcharge: long (nullable = true) 
     |    |-- infanttotalfare: long (nullable = true) 
     |    |-- servicetax: long (nullable = true) 
     |    |-- totalbasefare: long (nullable = true) 
     |    |-- totalcommission: double (nullable = true) 
     |    |-- totalfare: long (nullable = true) 
     |    |-- totalsurcharge: long (nullable = true) 
     |    |-- transactionfee: long (nullable = true) 
     |-- farebasis: string (nullable = true) 
     |-- farerule: string (nullable = true) 
     |-- flightcode: string (nullable = true) 
     |-- flightno: string (nullable = true) 
     |-- k: string (nullable = true) 
     |-- onwardflights: array (nullable = true) 
     |    |-- element: string (containsNull = true) 
     |-- origin: string (nullable = true) 
     |-- promocode: string (nullable = true) 
     |-- promodiscount: long (nullable = true) 
     |-- promotionText: string (nullable = true) 
     |-- stops: string (nullable = true) 
     |-- tickettype: string (nullable = true) 
     |-- totalbasefare: long (nullable = true) 
     |-- totalcommission: long (nullable = true) 
     |-- totalfare: long (nullable = true) 
     |-- totalpriceamount: long (nullable = true) 
     |-- totalsurcharge: long (nullable = true) 
     |-- transactionfee: long (nullable = true) 
     |-- viacharges: long (nullable = true) 
     |-- warnings: string (nullable = true) 
    
    
    
    Now i want to flatten it so that the fare field will be removed and everything will be flatten 
    
    For this i used explode. But i am getting an error: 
    
    org.apache.spark.sql.AnalysisException: cannot resolve 'explode(fare)' due to data type mismatch: input to function explode should be array or map type, not StructType(StructField(A,LongType,true), StructField(C,LongType,true), StructField(I,LongType,true), StructField(adultairlinetxncharge,LongType,true), StructField(adultairporttax,LongType,true), StructField(adultbasefare,LongType,true), StructField(adultcommission,DoubleType,true), StructField(adultsurcharge,LongType,true), StructField(adulttotalfare,LongType,true), StructField(childairlinetxncharge,LongType,true), StructField(childairporttax,LongType,true), StructField(childbasefare,LongType,true), StructField(childcommission,DoubleType,true), StructField(childsurcharge,LongType,true), StructField(childtotalfare,LongType,true), StructField(discount,LongType,true), StructField(infantairlinetxncharge,LongType,true), StructField(infantairporttax,LongType,true), StructField(infantbasefare,LongType,true), StructField(infantcommis
 sion,LongType,true), StructField(infantsurcharge,LongType,true), StructField(infanttotalfare,LongType,true), StructField(servicetax,LongType,true), StructField(totalbasefare,LongType,true), StructField(totalcommission,DoubleType,true), StructField(totalfare,LongType,true), StructField(totalsurcharge,LongType,true), StructField(transactionfee,LongType,true)); 
    
    If not explode how can i flatten it.Your help will be appreciated. Thanks



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84451800
  
    Looks like there is only two small style comments before this can be merged.  Thanks for working on it!  Would you mind also updating the description.  I believe it still describes the original solution and not the newest version that has been backported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by anselmevignon <gi...@git.apache.org>.
Github user anselmevignon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25587002
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -263,6 +356,22 @@ abstract class ParquetTest extends QueryTest with BeforeAndAfterAll {
         }
       }
     
    +  Seq("partitioned_parquet_with_key_and_complextypes", "partitioned_parquet_with_complextypes").foreach { table =>
    +    test(s"SPARK-5775 read struct from $table") {
    +      checkAnswer(
    +        sql(s"SELECT p,  structField.intStructField , structField.stringStructField FROM $table WHERE p = 1"),
    +        (1 to 10).map { i => ((1, i, f"${i}_string"))}
    +      )
    +    }
    +
    +    test (s"SPARK-5775 read array from $table") {
    +              checkAnswer(
    +                sql(s"SELECT arrayField, p FROM $table WHERE p = 1"),
    +                (1 to 10).map { i => ((1 to i,1))}
    --- End diff --
    
    Thanks for the review !
    
    I will be checking further commits against https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide . Would kindly indicate if tool to automatically check for the type of issues you raised ?
    
    A.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75261678
  
      [Test build #27777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27777/consoleFull) for   PR 4697 at commit [`8fc6a8c`](https://github.com/apache/spark/commit/8fc6a8ccdf0f232ecfdf1916111b538e1fb6bfab).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457280
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -70,6 +72,38 @@ class ParquetMetastoreSuite extends ParquetTest {
         """)
     
         sql(s"""
    +      create external table partitioned_parquet_with_complextypes
    +      (
    +        intField INT,
    +        stringField STRING,
    +        structField STRUCT<intStructField :INT, stringStructField :STRING>,
    +        arrayField ARRAY<INT>
    +      )
    +      PARTITIONED BY (p int)
    +      ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    +       STORED AS
    +       INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
    +       OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
    +      location '${partitionedTableDirWithComplexTypes.getCanonicalPath}'
    +    """)
    +    
    +    sql(s"""
    +      create external table partitioned_parquet_with_key_and_complextypes
    +      (
    +        intField INT,
    +        stringField STRING,
    +        structField STRUCT<intStructField :INT, stringStructField :STRING>,
    --- End diff --
    
    Same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75151308
  
    **[Test build #27734 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27734/consoleFull)**     for PR 4697 at commit [`22cec52`](https://github.com/apache/spark/commit/22cec5206091580e9922f997ef8052ded393d225)     after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84940364
  
      [Test build #28994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28994/consoleFull) for   PR 4697 at commit [`6a4c53d`](https://github.com/apache/spark/commit/6a4c53d9491d182cc90c3160c7418b58f3b3062a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75242649
  
      [Test build #27777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27777/consoleFull) for   PR 4697 at commit [`8fc6a8c`](https://github.com/apache/spark/commit/8fc6a8ccdf0f232ecfdf1916111b538e1fb6bfab).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-85147635
  
    Mind closing this now?  PRs to branches other than master do not auto close.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-85147251
  
    Thanks!  Merged to branch-1.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75111236
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by anselmevignon <gi...@git.apache.org>.
Github user anselmevignon closed the pull request at:

    https://github.com/apache/spark/pull/4697


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457193
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
    @@ -278,10 +278,18 @@ case class ParquetRelation2(path: String)(@transient val sqlContext: SQLContext)
               }.toMap
     
             val currentValue = partValues.values.head.toInt
    -        iter.map { pair =>
    -          val res = pair._2.asInstanceOf[SpecificMutableRow]
    -          res.setInt(partitionKeyLocation, currentValue)
    -          res
    +        iter.map { _._2 match {
    +          case row: SpecificMutableRow => {
    +            val res = row.asInstanceOf[SpecificMutableRow]
    +            res.setInt(partitionKeyLocation, currentValue)
    +            res
    +          }
    +          case row: Row => {
    +            val rowContent = row.to[Array]
    +            rowContent.update(partitionKeyLocation, currentValue)
    +            Row.fromSeq(rowContent)
    +          }
    +        }
    --- End diff --
    
    Same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-76623990
  
    Since it has been fixed in master and branch-1.3, it will be great if we can have the same changes with aa39460d4bb4c41084d350ccb1c5a56cd61239b7 for branch-1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457373
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -171,9 +231,40 @@ abstract class ParquetTest extends QueryTest with BeforeAndAfterAll {
             .map(i => ParquetDataWithKey(p, i, s"part-$p"))
             .saveAsParquetFile(partDir.getCanonicalPath)
         }
    +
    +    partitionedTableDirWithKeyAndComplexTypes = File.createTempFile("parquettests", "sparksql")
    +    partitionedTableDirWithKeyAndComplexTypes.delete()
    +    partitionedTableDirWithKeyAndComplexTypes.mkdir()
    +
    +    (1 to 10).foreach { p =>
    +      val partDir = new File(partitionedTableDirWithKeyAndComplexTypes, s"p=$p")
    +      sparkContext.makeRDD(1 to 10)
    +        .map(i => ParquetDataWithKeyAndComplexTypes(p, i,s"part-$p", StructContainer(i,f"${i}_string"), (1 to i)))
    +        .saveAsParquetFile(partDir.getCanonicalPath)
    +    }
    +
    +    partitionedTableDirWithComplexTypes = File.createTempFile("parquettests", "sparksql")
    +    partitionedTableDirWithComplexTypes.delete()
    +    partitionedTableDirWithComplexTypes.mkdir()
    +
    +    (1 to 10).foreach { p =>
    +      val partDir = new File(partitionedTableDirWithComplexTypes, s"p=$p")
    +      sparkContext.makeRDD(1 to 10)
    +        .map(i => ParquetDataWithComplexTypes(i,s"part-$p", StructContainer(i,f"${i}_string"), (1 to i)))
    +        .saveAsParquetFile(partDir.getCanonicalPath)
    +    }
    +
    +  }
    +
    +  override def afterAll(): Unit = {
    +    //delete temporary files
    +    partitionedTableDir.delete()
    +    partitionedTableDirWithKey.delete()
    +    partitionedTableDirWithKeyAndComplexTypes.delete()
    +    partitionedTableDirWithComplexTypes.delete()
       }
     
    -  Seq("partitioned_parquet", "partitioned_parquet_with_key").foreach { table =>
    +  Seq("partitioned_parquet", "partitioned_parquet_with_key", "partitioned_parquet_with_key_and_complextypes","partitioned_parquet_with_complextypes").foreach { table =>
    --- End diff --
    
    100 columns exceeded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75261697
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27777/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-193380455
  
    @anirudhcelebal please ask questions on the spark-user list instead of PRs.
    
    `explode` only works with arrays.  you probably want something like `SELECT fare.* ...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-76250548
  
    Hey @anselmevignon, because 1.3 release is really close, I just made #4697 based on your work, but target to master and branch-1.3. We can still polish this PR (mainly minor styling issues) and merge it into branch-1.2. Thanks for working on this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75128452
  
      [Test build #27734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27734/consoleFull) for   PR 4697 at commit [`22cec52`](https://github.com/apache/spark/commit/22cec5206091580e9922f997ef8052ded393d225).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-76759222
  
      [Test build #28176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28176/consoleFull) for   PR 4697 at commit [`52f73fc`](https://github.com/apache/spark/commit/52f73fca6e99baa7777c5402e6cdeefb68958464).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75128095
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-76741937
  
      [Test build #28176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28176/consoleFull) for   PR 4697 at commit [`52f73fc`](https://github.com/apache/spark/commit/52f73fca6e99baa7777c5402e6cdeefb68958464).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457332
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -171,9 +231,40 @@ abstract class ParquetTest extends QueryTest with BeforeAndAfterAll {
             .map(i => ParquetDataWithKey(p, i, s"part-$p"))
             .saveAsParquetFile(partDir.getCanonicalPath)
         }
    +
    +    partitionedTableDirWithKeyAndComplexTypes = File.createTempFile("parquettests", "sparksql")
    +    partitionedTableDirWithKeyAndComplexTypes.delete()
    +    partitionedTableDirWithKeyAndComplexTypes.mkdir()
    +
    +    (1 to 10).foreach { p =>
    +      val partDir = new File(partitionedTableDirWithKeyAndComplexTypes, s"p=$p")
    +      sparkContext.makeRDD(1 to 10)
    +        .map(i => ParquetDataWithKeyAndComplexTypes(p, i,s"part-$p", StructContainer(i,f"${i}_string"), (1 to i)))
    --- End diff --
    
    100 columns exceeded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by anselmevignon <gi...@git.apache.org>.
Github user anselmevignon commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-85158386
  
    I closing it, thanks a lot everyone for the merge, the review, and the patience with newbyism :)
    
    Anselme


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by anselmevignon <gi...@git.apache.org>.
Github user anselmevignon commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-76680120
  
    Hi @ayoub-benali,
    
    Sorry for the delay, I was OOO during the end of the week.
    
    I will be correcting the style and getting rid of the dynamic type checking (following the changes @yhuai and @liancheng made on the 1.3 branch / master)
    
    This being said, I am also enabling you to write on the branch; if there is anything else feel free to solve it either way you prefer (comment and let me update, or write the change on your own.)
    
    cheers,
    
    Anselme
    
    
      


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by anselmevignon <gi...@git.apache.org>.
Github user anselmevignon commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75384490
  
    @ayoub-benali thanks for the review !
    
    about SPARK-5508, the stacktrace really looks like  the same as a similar one I saw with a  spark.sql.hive.convertMetastoreParquet badly set. Could that be related ? Would it be possible that the insert is writing in an "old format" parquet ? (never read this part of the code, so no idea on the specifics, sorry...)
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457406
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -263,6 +356,22 @@ abstract class ParquetTest extends QueryTest with BeforeAndAfterAll {
         }
       }
     
    +  Seq("partitioned_parquet_with_key_and_complextypes", "partitioned_parquet_with_complextypes").foreach { table =>
    +    test(s"SPARK-5775 read struct from $table") {
    +      checkAnswer(
    +        sql(s"SELECT p,  structField.intStructField , structField.stringStructField FROM $table WHERE p = 1"),
    +        (1 to 10).map { i => ((1, i, f"${i}_string"))}
    +      )
    +    }
    +
    +    test (s"SPARK-5775 read array from $table") {
    +              checkAnswer(
    +                sql(s"SELECT arrayField, p FROM $table WHERE p = 1"),
    +                (1 to 10).map { i => ((1 to i,1))}
    +              )
    --- End diff --
    
    Indentation is off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84891332
  
      [Test build #28985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28985/consoleFull) for   PR 4697 at commit [`6a4c53d`](https://github.com/apache/spark/commit/6a4c53d9491d182cc90c3160c7418b58f3b3062a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84939871
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457221
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
    @@ -31,7 +31,9 @@ import org.apache.spark.sql.hive.test.TestHive._
     case class ParquetData(intField: Int, stringField: String)
     // The data that also includes the partitioning key
     case class ParquetDataWithKey(p: Int, intField: Int, stringField: String)
    -
    +case class StructContainer(intStructField :Int, stringStructField: String )
    +case class ParquetDataWithComplexTypes(intField :Int, stringField: String ,structField: StructContainer, arrayField: Seq[Int])
    +case class ParquetDataWithKeyAndComplexTypes(p: Int,intField :Int, stringField: String , structField: StructContainer, arrayField: Seq[Int])
    --- End diff --
    
    100 columns exceeded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by ayoub-benali <gi...@git.apache.org>.
Github user ayoub-benali commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75258468
  
    Just tried to reproduce the example in SPARK-5775 with the spark shell and now it hangs for ever during query time. 
    Maybe because the tests don't reproduce the same example as in the issue: array of struct.
    
    ```scala
    scala> hiveContext.sql("select data.field1 from test_table LATERAL VIEW explode(data_array) nestedStuff AS data").collect
    15/02/20 16:32:55 INFO ParseDriver: Parsing command: select data.field1 from test_table LATERAL VIEW explode(data_array) nestedStuff AS data
    15/02/20 16:32:55 INFO ParseDriver: Parse Completed
    15/02/20 16:32:55 INFO MemoryStore: ensureFreeSpace(260309) called with curMem=97368, maxMem=280248975
    15/02/20 16:32:55 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 254.2 KB, free 266.9 MB)
    15/02/20 16:32:55 INFO MemoryStore: ensureFreeSpace(28517) called with curMem=357677, maxMem=280248975
    15/02/20 16:32:55 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 27.8 KB, free 266.9 MB)
    15/02/20 16:32:55 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ****:54658 (size: 27.8 KB, free: 267.2 MB)
    15/02/20 16:32:55 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
    15/02/20 16:32:55 INFO SparkContext: Created broadcast 2 from NewHadoopRDD at ParquetTableOperations.scala:119
    15/02/20 16:32:55 INFO FileInputFormat: Total input paths to process : 3
    15/02/20 16:32:55 INFO ParquetInputFormat: Total input paths to process : 3
    15/02/20 16:32:55 INFO ParquetFileReader: Initiating action with parallelism: 5
    15/02/20 16:32:55 INFO ParquetFileReader: reading summary file: hdfs://****:8020/path/test_table/date=2015-02-12/_metadata
    15/02/20 16:32:55 INFO ParquetFileReader: reading another 1 footers
    15/02/20 16:32:55 INFO ParquetFileReader: Initiating action with parallelism: 5
    15/02/20 16:32:55 INFO FilteringParquetRowInputFormat: Fetched [LocatedFileStatus{path=hdfs://****:8020/path/test_table/date=2015-02-12/part-r-1.parquet; isDirectory=false; length=463; replication=3; blocksize=134217728; modification_time=1424446345899; access_time=1424446344501; owner=rptn_deploy; group=supergroup; permission=rw-r--r--; isSymlink=false}, LocatedFileStatus{path=hdfs://****:8020/path/test_table/date=2015-02-12/part-r-2.parquet; isDirectory=false; length=731; replication=3; blocksize=134217728; modification_time=1424446346655; access_time=1424446345540; owner=rptn_deploy; group=supergroup; permission=rw-r--r--; isSymlink=false}, LocatedFileStatus{path=hdfs://****:8020/path/test_table/date=2015-02-12/part-r-3.parquet; isDirectory=false; length=727; replication=3; blocksize=134217728; modification_time=1424446346773; access_time=1424446345628; owner=rptn_deploy; group=supergroup; permission=rw-r--r--; isSymlink=false}] footers in 31 ms
    15/02/20 16:32:55 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
    15/02/20 16:32:55 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
    15/02/20 16:32:55 INFO FilteringParquetRowInputFormat: Using Task Side Metadata Split Strategy
    15/02/20 16:32:55 INFO SparkContext: Starting job: collect at SparkPlan.scala:84
    15/02/20 16:32:55 INFO DAGScheduler: Got job 2 (collect at SparkPlan.scala:84) with 3 output partitions (allowLocal=false)
    15/02/20 16:32:55 INFO DAGScheduler: Final stage: Stage 2(collect at SparkPlan.scala:84)
    15/02/20 16:32:55 INFO DAGScheduler: Parents of final stage: List()
    15/02/20 16:32:55 INFO DAGScheduler: Missing parents: List()
    15/02/20 16:32:55 INFO DAGScheduler: Submitting Stage 2 (MappedRDD[26] at map at SparkPlan.scala:84), which has no missing parents
    15/02/20 16:32:56 INFO MemoryStore: ensureFreeSpace(7616) called with curMem=386194, maxMem=280248975
    15/02/20 16:32:56 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 7.4 KB, free 266.9 MB)
    15/02/20 16:32:56 INFO MemoryStore: ensureFreeSpace(4225) called with curMem=393810, maxMem=280248975
    15/02/20 16:32:56 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 4.1 KB, free 266.9 MB)
    15/02/20 16:32:56 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ****:54658 (size: 4.1 KB, free: 267.2 MB)
    15/02/20 16:32:56 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
    15/02/20 16:32:56 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838
    15/02/20 16:32:56 INFO DAGScheduler: Submitting 3 missing tasks from Stage 2 (MappedRDD[26] at map at SparkPlan.scala:84)
    15/02/20 16:32:56 INFO TaskSchedulerImpl: Adding task set 2.0 with 3 tasks
    15/02/20 16:32:56 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 6, ****, NODE_LOCAL, 1639 bytes)
    15/02/20 16:32:56 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 7, ****, NODE_LOCAL, 1638 bytes)
    15/02/20 16:32:56 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 8, ****, NODE_LOCAL, 1639 bytes)
    15/02/20 16:32:56 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ****:45208 (size: 4.1 KB, free: 133.6 MB)
    15/02/20 16:32:56 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ****:52420 (size: 4.1 KB, free: 133.6 MB)
    15/02/20 16:32:56 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ****:43309 (size: 4.1 KB, free: 133.6 MB)
    15/02/20 16:32:56 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ****:43309 (size: 27.8 KB, free: 133.6 MB)
    15/02/20 16:32:56 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ****:52420 (size: 27.8 KB, free: 133.6 MB)
    15/02/20 16:32:56 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ****:45208 (size: 27.8 KB, free: 133.6 MB)
    15/02/20 16:32:56 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 8) in 490 ms on **** (1/3)
    15/02/20 16:36:01 INFO BlockManager: Removing broadcast 1
    15/02/20 16:36:01 INFO BlockManager: Removing block broadcast_1_piece0
    15/02/20 16:36:01 INFO MemoryStore: Block broadcast_1_piece0 of size 31176 dropped from memory (free 279882116)
    15/02/20 16:36:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ****:54658 in memory (size: 30.4 KB, free: 267.2 MB)
    15/02/20 16:36:01 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
    15/02/20 16:36:01 INFO BlockManager: Removing block broadcast_1
    15/02/20 16:36:01 INFO MemoryStore: Block broadcast_1 of size 66192 dropped from memory (free 279948308)
    15/02/20 16:36:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ****:52420 in memory (size: 30.4 KB, free: 133.6 MB)
    15/02/20 16:36:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ****:45208 in memory (size: 30.4 KB, free: 133.6 MB)
    15/02/20 16:36:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ****:43309 in memory (size: 30.4 KB, free: 133.6 MB)
    15/02/20 16:36:01 INFO ContextCleaner: Cleaned broadcast 1
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by ayoub-benali <gi...@git.apache.org>.
Github user ayoub-benali commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-75371721
  
    @anselmevignon I just re-tested and it worked. Thanks :+1:
    
    Little off topic, I checked also if this pull request would solve [SPARK-5508](https://issues.apache.org/jira/browse/SPARK-5508) but it didn't work. It seems that other issue is linked to the writing part (INSERT) and not the reading (select).   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-76759240
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28176/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84969598
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28994/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by anselmevignon <gi...@git.apache.org>.
Github user anselmevignon commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-84908406
  
    Should be better now. Would you mind commenting here if there are updates needed, I seem to have trouble receiving notifications from the inline comments. 
    Thanks for the review.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by ayoub-benali <gi...@git.apache.org>.
Github user ayoub-benali commented on the pull request:

    https://github.com/apache/spark/pull/4697#issuecomment-76593597
  
    Hi @anselmevignon, would you mind fixing the styling issues so that PR get merged in 1.2 branch ?
    if you don't plan to work on it any more, could you allow me to commit to your branch so that I can update this PR ? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4697#discussion_r25457171
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala ---
    @@ -144,19 +144,33 @@ case class ParquetTableScan(
             new Iterator[Row] {
               def hasNext = iter.hasNext
               def next() = {
    -            val row = iter.next()._2.asInstanceOf[SpecificMutableRow]
    +            iter.next()._2 match {
    +              case row: SpecificMutableRow => {
     
    -            // Parquet will leave partitioning columns empty, so we fill them in here.
    +              // Parquet will leave partitioning columns empty, so we fill them in here.
                 var i = 0
                 while (i < requestedPartitionOrdinals.size) {
    -              row(requestedPartitionOrdinals(i)._2) =
    -                partitionRowValues(requestedPartitionOrdinals(i)._1)
    +              row (requestedPartitionOrdinals (i)._2) =
    +              partitionRowValues (requestedPartitionOrdinals (i)._1)
                   i += 1
                 }
                 row
    +            }
    +              case row : Row => {
    +                val rVals = row.to[Array]
    +                var i = 0
    +                while (i < requestedPartitionOrdinals.size) {
    +                  rVals
    +                    .update(
    +                      requestedPartitionOrdinals (i)._2,
    +                      partitionRowValues (requestedPartitionOrdinals (i)._1))
    +                  i += 1
    +                }
    +                Row.fromSeq(rVals)
    +              }
               }
             }
    -      }
    +      }}
    --- End diff --
    
    For indentation and whitespace related styling, please refer to equivalent changes in #4792.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org