You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2015/09/01 12:50:57 UTC

[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/8553

    [SPARK-10395] [SQL] Simplifies CatalystReadSupport

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-10395/simplify-parquet-read-support

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8553
    
----
commit d7173a906b830726e7856de7bf1376410d96de51
Author: Cheng Lian <li...@databricks.com>
Date:   2015-09-01T10:49:11Z

    Simplifies CatalystReadSupport

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8553#discussion_r38617544
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala ---
    @@ -32,31 +32,56 @@ import org.apache.spark.Logging
     import org.apache.spark.sql.catalyst.InternalRow
     import org.apache.spark.sql.types._
     
    +/**
    + * A Parquet [[ReadSupport]] implementation for reading Parquet records as Catalyst
    + * [[InternalRow]]s.
    + *
    + * The API interface of [[ReadSupport]] is a little bit over complicated because of historical
    + * reasons.  In older versions of parquet-mr (say 1.6.0rc3 and prior), [[ReadSupport]] need to be
    + * instantiated and initialized twice on both driver side and executor side.  The [[init()]] method
    + * is for driver side initialization, while [[prepareForRead()]] is for executor side.  However,
    + * starting from parquet-mr 1.6.0, it's no longer the case, and [[ReadSupport]] is only instantiated
    + * and initialized on executor side.  So, theoretically, now it's totally fine to combine these two
    + * methods into a single initialization method.  The only reason (I could think of) to still have
    + * them here is for parquet-mr API backwards-compatibility.
    + *
    + * Due to this reason, we no longer rely on [[ReadContext]] to pass requested schema from [[init()]]
    + * to [[prepareForRead()]], but use a private `var` for simplicity.
    + */
     private[parquet] class CatalystReadSupport extends ReadSupport[InternalRow] with Logging {
    -  // Called after `init()` when initializing Parquet record reader.
    +  private var catalystRequestedSchema: StructType = _
    +
    +  /**
    +   * Called on executor side before [[prepareForRead()]] and instantiating actual Parquet record
    +   * readers.  Responsible for figuring out Parquet requested schema used for column pruning.
    +   */
    +  override def init(context: InitContext): ReadContext = {
    +    catalystRequestedSchema = {
    +      val conf = context.getConfiguration
    +      val schemaString = conf.get(CatalystReadSupport.SPARK_ROW_REQUESTED_SCHEMA)
    +      assert(schemaString != null, "Parquet requested schema not set.")
    +      StructType.fromString(schemaString)
    +    }
    +
    +    val parquetRequestedSchema =
    +      CatalystReadSupport.clipParquetSchema(context.getFileSchema, catalystRequestedSchema)
    +
    +    new ReadContext(parquetRequestedSchema, Map.empty[String, String].asJava)
    +  }
    +
    +  /**
    +   * Called on executor side after [[init()]], before instantiating actual Parquet record readers.
    +   * Responsible for instantiating [[RecordMaterializer]], which is used for converting Parquet
    +   * records to Catalyst [[InternalRow]]s.
    +   */
       override def prepareForRead(
           conf: Configuration,
           keyValueMetaData: JMap[String, String],
           fileSchema: MessageType,
           readContext: ReadContext): RecordMaterializer[InternalRow] = {
         log.debug(s"Preparing for read Parquet file with message type: $fileSchema")
    -
    -    val toCatalyst = new CatalystSchemaConverter(conf)
         val parquetRequestedSchema = readContext.getRequestedSchema
     
    -    val catalystRequestedSchema =
    -      Option(readContext.getReadSupportMetadata).map(_.asScala).flatMap { metadata =>
    -        metadata
    -          // First tries to read requested schema, which may result from projections
    -          .get(CatalystReadSupport.SPARK_ROW_REQUESTED_SCHEMA)
    -          // If not available, tries to read Catalyst schema from file metadata.  It's only
    -          // available if the target file is written by Spark SQL.
    -          .orElse(metadata.get(CatalystReadSupport.SPARK_METADATA_KEY))
    -      }.map(StructType.fromString).getOrElse {
    -        logInfo("Catalyst schema not available, falling back to Parquet schema")
    -        toCatalyst.convert(parquetRequestedSchema)
    -      }
    --- End diff --
    
    This "fallback" logic is removed because now we always set requested schema properly along the read path. This piece of code was inherited from the old Parquet support, which has already been removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-136673209
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-143870541
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-139984526
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-136674628
  
      [Test build #41870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41870/consoleFull) for   PR 8553 at commit [`d7173a9`](https://github.com/apache/spark/commit/d7173a906b830726e7856de7bf1376410d96de51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8553


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-136711772
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41870/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-139983126
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-139984527
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42412/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-136673165
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-143870853
  
    Merged into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-139984523
  
      [Test build #42412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42412/console) for   PR 8553 at commit [`d8a1ba4`](https://github.com/apache/spark/commit/d8a1ba439de0b9c4ff7a4eeac1b3517cc1c66cc0).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-136711289
  
      [Test build #41870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41870/console) for   PR 8553 at commit [`d7173a9`](https://github.com/apache/spark/commit/d7173a906b830726e7856de7bf1376410d96de51).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-140918393
  
      [Test build #42543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42543/console) for   PR 8553 at commit [`4dfab07`](https://github.com/apache/spark/commit/4dfab07d52db2001b41e11c1a2790769c8bcf8e9).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-140891110
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8553#discussion_r40277489
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala ---
    @@ -69,36 +97,6 @@ private[parquet] class CatalystReadSupport extends ReadSupport[InternalRow] with
     
         new CatalystRecordMaterializer(parquetRequestedSchema, catalystRequestedSchema)
       }
    -
    -  // Called before `prepareForRead()` when initializing Parquet record reader.
    -  override def init(context: InitContext): ReadContext = {
    -    val conf = {
    -      // scalastyle:off jobcontext
    -      context.getConfiguration
    -      // scalastyle:on jobcontext
    -    }
    -
    -    // If the target file was written by Spark SQL, we should be able to find a serialized Catalyst
    -    // schema of this file from its metadata.
    -    val maybeRowSchema = Option(conf.get(RowWriteSupport.SPARK_ROW_SCHEMA))
    -
    -    // Optional schema of requested columns, in the form of a string serialized from a Catalyst
    -    // `StructType` containing all requested columns.
    -    val maybeRequestedSchema = Option(conf.get(CatalystReadSupport.SPARK_ROW_REQUESTED_SCHEMA))
    -
    -    val parquetRequestedSchema =
    -      maybeRequestedSchema.fold(context.getFileSchema) { schemaString =>
    -        val catalystRequestedSchema = StructType.fromString(schemaString)
    -        CatalystReadSupport.clipParquetSchema(context.getFileSchema, catalystRequestedSchema)
    -      }
    -
    -    val metadata =
    -      Map.empty[String, String] ++
    -        maybeRequestedSchema.map(CatalystReadSupport.SPARK_ROW_REQUESTED_SCHEMA -> _) ++
    -        maybeRowSchema.map(RowWriteSupport.SPARK_ROW_SCHEMA -> _)
    --- End diff --
    
    Why did we pass in the `maybeRowSchema` before? Seems it was not used by `prepareForRead`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8553#discussion_r40349377
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala ---
    @@ -69,36 +97,6 @@ private[parquet] class CatalystReadSupport extends ReadSupport[InternalRow] with
     
         new CatalystRecordMaterializer(parquetRequestedSchema, catalystRequestedSchema)
       }
    -
    -  // Called before `prepareForRead()` when initializing Parquet record reader.
    -  override def init(context: InitContext): ReadContext = {
    -    val conf = {
    -      // scalastyle:off jobcontext
    -      context.getConfiguration
    -      // scalastyle:on jobcontext
    -    }
    -
    -    // If the target file was written by Spark SQL, we should be able to find a serialized Catalyst
    -    // schema of this file from its metadata.
    -    val maybeRowSchema = Option(conf.get(RowWriteSupport.SPARK_ROW_SCHEMA))
    -
    -    // Optional schema of requested columns, in the form of a string serialized from a Catalyst
    -    // `StructType` containing all requested columns.
    -    val maybeRequestedSchema = Option(conf.get(CatalystReadSupport.SPARK_ROW_REQUESTED_SCHEMA))
    -
    -    val parquetRequestedSchema =
    -      maybeRequestedSchema.fold(context.getFileSchema) { schemaString =>
    -        val catalystRequestedSchema = StructType.fromString(schemaString)
    -        CatalystReadSupport.clipParquetSchema(context.getFileSchema, catalystRequestedSchema)
    -      }
    -
    -    val metadata =
    -      Map.empty[String, String] ++
    -        maybeRequestedSchema.map(CatalystReadSupport.SPARK_ROW_REQUESTED_SCHEMA -> _) ++
    -        maybeRowSchema.map(RowWriteSupport.SPARK_ROW_SCHEMA -> _)
    --- End diff --
    
    IIRC it was used by the old Parquet support code, which has already been removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8553#discussion_r38504498
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala ---
    @@ -32,31 +32,56 @@ import org.apache.spark.Logging
     import org.apache.spark.sql.catalyst.InternalRow
     import org.apache.spark.sql.types._
     
    +/**
    + * A Parquet [[ReadSupport]] implementation for reading Parquet records as Catalyst
    + * [[InternalRow]]s.
    + *
    + * The API interface of [[ReadSupport]] is a little bit over complicated because of historical
    + * reasons.  In older versions of parquet-mr (say 1.6.0rc3 and prior), [[ReadSupport]] need to be
    + * instantiated and initialized twice on both driver side and executor side.  The [[init()]] method
    + * is for driver side initialization, while [[prepareForRead()]] is for executor side.  However,
    + * starting from parquet-mr 1.6.0, it's no longer the case, and [[ReadSupport]] is only instantiated
    + * and initialized on executor side.  So, theoretically, now it's totally fine to combine these two
    + * methods into a single initialization method.  The only reason (I could think of) to still have
    + * them here is for parquet-mr API backwards-compatibility.
    + *
    + * Due to this reason, we no longer rely on [[ReadContext]] to pass requested schema from [[init()]]
    + * to [[prepareForRead()]], but use a private `var` for simplicity.
    + */
     private[parquet] class CatalystReadSupport extends ReadSupport[InternalRow] with Logging {
    -  // Called after `init()` when initializing Parquet record reader.
    +  private var catalystRequestedSchema: StructType = _
    +
    +  /**
    +   * Called on executor side before [[prepareForRead()]] and instantiating actual Parquet record
    +   * readers.  Responsible for figuring out Parquet requested schema used for column pruning.
    +   */
    +  override def init(context: InitContext): ReadContext = {
    --- End diff --
    
    Moved this method in front of `prepareForRead()` for better readability, since this method is called right before `prepareForRead()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-139984146
  
      [Test build #42412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42412/consoleFull) for   PR 8553 at commit [`d8a1ba4`](https://github.com/apache/spark/commit/d8a1ba439de0b9c4ff7a4eeac1b3517cc1c66cc0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-136711770
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-140918493
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42543/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-140918491
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-140891979
  
      [Test build #42543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42543/consoleFull) for   PR 8553 at commit [`4dfab07`](https://github.com/apache/spark/commit/4dfab07d52db2001b41e11c1a2790769c8bcf8e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-139983150
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10395] [SQL] Simplifies CatalystReadSup...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8553#issuecomment-140891151
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org