You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2017/08/16 01:44:25 UTC

[GitHub] spark pull request #18953: [SPARK-20682][SQL] Implement new ORC data source ...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/18953

    [SPARK-20682][SQL] Implement new ORC data source based on Apache ORC

    ## What changes were proposed in this pull request?
    
    Since #17924, #17943, and #17980 are a little large PRs, this is a minimized version for next review excluding the followings. This PR still include #18640. I will rebase after #18640 is merged.
    
    - `OrcReadBenchmark.scala`
    - `OrcColumnarBatchReader.scala`
    - New ORC Test suites in `sql/core`
    
    This PR shows new ORC datasource replaces the old ORC datasource completely. After review, I will remove the change on old ORC datasource. We will allow to choose one of them in #17980 .
    
    ## How was this patch tested?
    
    Pass the Jenkins.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-20682-3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18953.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18953
    
----
commit 051ed1fd86ee1354d1e650b1cf51a41db2d83619
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-08-16T01:32:37Z

    [SPARK-20682][SQL] Implement new ORC data source based on Apache ORC

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80827/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan . As you adviced, I will replace old ORC in the current namespace and will try to move to `sql/core` later. Although, we cannot switch among old ORC and new ORC, we can bring back old ORC if need from the code. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan .
    Could you review this again when you have sometime?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80840/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80710/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    For the reader, there are three part.
    1. OrcColumnarBatchReader: It's not included here.
    2. **OrcRecordIterator**: It's included here. It doesn't not use Spark vectorization.
    3. **RecordReaderIterator[OrcStruct]**: It's used here.
    
    Like (1), I can exclude (2) in this PR. Is it okay?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    @dongjoon-hyun can you put more information in the PR description? `update OrcFileFormat` is too vague.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81326/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81597/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan .
    Could you review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80771/testReport)** for PR 18953 at commit [`80c80f3`](https://github.com/apache/spark/commit/80c80f34eb4dfb7c94d7875438effab52c71575d).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80721/testReport)** for PR 18953 at commit [`07778ed`](https://github.com/apache/spark/commit/07778ed449bbf7ce2f1b5e8258e6ef58475b289c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80877/testReport)** for PR 18953 at commit [`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18953: [SPARK-20682][SQL] Update ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18953#discussion_r134657662
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcUtils.scala ---
    @@ -0,0 +1,316 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive.orc
    +
    +import java.io.IOException
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.{FileSystem, Path}
    +import org.apache.hadoop.io._
    +import org.apache.orc.{OrcFile, TypeDescription}
    +import org.apache.orc.mapred.{OrcList, OrcMap, OrcStruct, OrcTimestamp}
    +import org.apache.orc.storage.common.`type`.HiveDecimal
    +import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable}
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
    +import org.apache.spark.sql.catalyst.util._
    +import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
    +
    +object OrcUtils {
    +  /**
    +   * Read ORC file schema. This method is used in `inferSchema`.
    +   */
    +  private[orc] def readSchema(file: Path, conf: Configuration): Option[TypeDescription] = {
    +    try {
    +      val options = OrcFile.readerOptions(conf).filesystem(FileSystem.get(conf))
    +      val reader = OrcFile.createReader(file, options)
    +      val schema = reader.getSchema
    +      if (schema.getFieldNames.isEmpty) {
    +        None
    +      } else {
    +        Some(schema)
    +      }
    +    } catch {
    +      case _: IOException => None
    +    }
    +  }
    +
    +  /**
    +   * Return ORC schema with schema field name correction and the total number of rows.
    +   */
    +  private[orc] def getSchemaAndNumberOfRows(
    +      dataSchema: StructType,
    +      filePath: String,
    +      conf: Configuration) = {
    +    val hdfsPath = new Path(filePath)
    +    val fs = hdfsPath.getFileSystem(conf)
    +    val reader = OrcFile.createReader(hdfsPath, OrcFile.readerOptions(conf).filesystem(fs))
    +    val rawSchema = reader.getSchema
    +    val orcSchema = if (!rawSchema.getFieldNames.isEmpty &&
    +        rawSchema.getFieldNames.asScala.forall(_.startsWith("_col"))) {
    +      var schemaString = rawSchema.toString
    +      dataSchema.zipWithIndex.foreach { case (field: StructField, index: Int) =>
    +        schemaString = schemaString.replace(s"_col$index:", s"${field.name}:")
    +      }
    +      TypeDescription.fromString(schemaString)
    +    } else {
    +      rawSchema
    +    }
    +    (orcSchema, reader.getNumberOfRows)
    +  }
    +
    +  /**
    +   * Return a ORC schema string for ORCStruct.
    +   */
    +  private[orc] def getSchemaString(schema: StructType): String = {
    +    schema.fields.map(f => s"${f.name}:${f.dataType.catalogString}").mkString("struct<", ",", ">")
    +  }
    +
    +  private[orc] def getTypeDescription(dataType: DataType) = dataType match {
    +    case st: StructType => TypeDescription.fromString(getSchemaString(st))
    +    case _ => TypeDescription.fromString(dataType.catalogString)
    +  }
    +
    +  /**
    +   * Return a Orc value object for the given Spark schema.
    +   */
    +  private[orc] def createOrcValue(dataType: DataType) =
    +    OrcStruct.createValue(getTypeDescription(dataType))
    +
    +  /**
    +   * Convert Apache ORC OrcStruct to Apache Spark InternalRow.
    +   * If internalRow is not None, fill into it. Otherwise, create a SpecificInternalRow and use it.
    +   */
    +  private[orc] def convertOrcStructToInternalRow(
    +      orcStruct: OrcStruct,
    +      schema: StructType,
    +      valueWrappers: Option[Seq[Any => Any]] = None,
    +      internalRow: Option[InternalRow] = None): InternalRow = {
    +    val mutableRow = internalRow.getOrElse(new SpecificInternalRow(schema.map(_.dataType)))
    +    val wrappers = valueWrappers.getOrElse(schema.fields.map(_.dataType).map(getValueWrapper).toSeq)
    +    for (schemaIndex <- 0 until schema.length) {
    +      val writable = orcStruct.getFieldValue(schema(schemaIndex).name)
    +      if (writable == null) {
    +        mutableRow.setNullAt(schemaIndex)
    +      } else {
    +        mutableRow(schemaIndex) = wrappers(schemaIndex)(writable)
    +      }
    +    }
    +
    +    mutableRow
    +  }
    +
    +  private def withNullSafe(f: Any => Any): Any => Any = {
    +    input => if (input == null) null else f(input)
    +  }
    +
    +
    +  /**
    +   * Builds a catalyst-value return function ahead of time according to DataType
    +   * to avoid pattern matching and branching costs per row.
    +   */
    +  private[orc] def getValueWrapper(dataType: DataType): Any => Any = dataType match {
    +    case NullType => _ => null
    +
    +    case BooleanType => withNullSafe(o => o.asInstanceOf[BooleanWritable].get)
    +
    +    case ByteType => withNullSafe(o => o.asInstanceOf[ByteWritable].get)
    +    case ShortType => withNullSafe(o => o.asInstanceOf[ShortWritable].get)
    +    case IntegerType => withNullSafe(o => o.asInstanceOf[IntWritable].get)
    +    case LongType => withNullSafe(o => o.asInstanceOf[LongWritable].get)
    +
    +    case FloatType => withNullSafe(o => o.asInstanceOf[FloatWritable].get)
    +    case DoubleType => withNullSafe(o => o.asInstanceOf[DoubleWritable].get)
    +
    +    case StringType => withNullSafe(o => UTF8String.fromBytes(o.asInstanceOf[Text].copyBytes))
    +
    +    case BinaryType =>
    +      withNullSafe { o =>
    +        val binary = o.asInstanceOf[BytesWritable]
    +        val bytes = new Array[Byte](binary.getLength)
    +        System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength)
    +        bytes
    +      }
    +
    +    case DateType =>
    +      withNullSafe(o => DateTimeUtils.fromJavaDate(o.asInstanceOf[DateWritable].get))
    +    case TimestampType =>
    +      withNullSafe(o => DateTimeUtils.fromJavaTimestamp(o.asInstanceOf[OrcTimestamp]))
    +
    +    case DecimalType.Fixed(precision, scale) =>
    +      withNullSafe { o =>
    +        val decimal = o.asInstanceOf[HiveDecimalWritable].getHiveDecimal()
    +        val v = Decimal(decimal.bigDecimalValue, decimal.precision(), decimal.scale())
    +        v.changePrecision(precision, scale)
    +        v
    +      }
    +
    +    case _: StructType =>
    +      withNullSafe { o =>
    +        val structValue = convertOrcStructToInternalRow(
    +          o.asInstanceOf[OrcStruct],
    +          dataType.asInstanceOf[StructType])
    +        structValue
    +      }
    +
    +    case ArrayType(elementType, _) =>
    +      withNullSafe { o =>
    +        val wrapper = getValueWrapper(elementType)
    +        val data = new scala.collection.mutable.ArrayBuffer[Any]
    +        o.asInstanceOf[OrcList[WritableComparable[_]]].asScala.foreach { x =>
    +          data += wrapper(x)
    +        }
    +        new GenericArrayData(data.toArray)
    +      }
    +
    +    case MapType(keyType, valueType, _) =>
    +      withNullSafe { o =>
    +        val keyWrapper = getValueWrapper(keyType)
    +        val valueWrapper = getValueWrapper(valueType)
    +        val map = new java.util.TreeMap[Any, Any]
    +        o.asInstanceOf[OrcMap[WritableComparable[_], WritableComparable[_]]]
    +          .entrySet().asScala.foreach { entry =>
    +          map.put(keyWrapper(entry.getKey), valueWrapper(entry.getValue))
    +        }
    +        ArrayBasedMapData(map.asScala)
    +      }
    +
    +    case udt: UserDefinedType[_] =>
    +      withNullSafe { o =>
    +        getValueWrapper(udt.sqlType)(o)
    +      }
    +
    +    case _ =>
    +      throw new UnsupportedOperationException(s"$dataType is not supported yet.")
    +  }
    +
    +  /**
    +   * Convert Apache Spark InternalRow to Apache ORC OrcStruct.
    +   */
    +  private[orc] def convertInternalRowToOrcStruct(
    +      row: InternalRow,
    +      schema: StructType,
    +      valueWrappers: Option[Seq[Any => Any]] = None,
    +      struct: Option[OrcStruct] = None): OrcStruct = {
    +    val wrappers =
    +      valueWrappers.getOrElse(schema.fields.map(_.dataType).map(getWritableWrapper).toSeq)
    +    val orcStruct = struct.getOrElse(createOrcValue(schema).asInstanceOf[OrcStruct])
    +
    +    for (schemaIndex <- 0 until schema.length) {
    +      val fieldType = schema(schemaIndex).dataType
    +      if (row.isNullAt(schemaIndex)) {
    +        orcStruct.setFieldValue(schemaIndex, null)
    +      } else {
    +        val field = row.get(schemaIndex, fieldType)
    +        val fieldValue = wrappers(schemaIndex)(field).asInstanceOf[WritableComparable[_]]
    +        orcStruct.setFieldValue(schemaIndex, fieldValue)
    +      }
    +    }
    +    orcStruct
    +  }
    +
    +  /**
    +   * Builds a WritableComparable-return function ahead of time according to DataType
    +   * to avoid pattern matching and branching costs per row.
    +   */
    +  private[orc] def getWritableWrapper(dataType: DataType): Any => Any = dataType match {
    --- End diff --
    
    Hi, @cloud-fan .
    I updated the PR to return functions. Could you review again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80832/testReport)** for PR 18953 at commit [`f8de872`](https://github.com/apache/spark/commit/f8de872106d67239581f495e5df60fe9a6d44257).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81012/testReport)** for PR 18953 at commit [`263b3dc`](https://github.com/apache/spark/commit/263b3dc3ca3e6df6107cd70bb8cebd230c0e937d).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80877/testReport)** for PR 18953 at commit [`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80877/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    @cloud-fan . I'll rethink about consolidation the old and the new. Thank you for the advice!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80980/testReport)** for PR 18953 at commit [`3d602ab`](https://github.com/apache/spark/commit/3d602ab85b2ac0b42e3f66078d2261d62d031867).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @marmbrus , @liancheng , @yhuai .
    Could you give me some advice about this ORC upgrade PR?
    I tried to minimize the diff of PR, so I didn't remove the unused old one.
    Thank you in advance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81597/testReport)** for PR 18953 at commit [`ed43eb7`](https://github.com/apache/spark/commit/ed43eb7fb47a1c65875bbf97a8e108abfb115925).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class OrcSourceSuite extends OrcSuite with SQLTestUtils `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Thank you, @cloud-fan . Yep. I will!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    what's the project plan for this ORC stuff? shall we move the old orc data source to sql/core with orc 1.4 first, and then send a new PR for vectorized reader?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80721/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80722/testReport)** for PR 18953 at commit [`07778ed`](https://github.com/apache/spark/commit/07778ed449bbf7ce2f1b5e8258e6ef58475b289c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80840/testReport)** for PR 18953 at commit [`c9321df`](https://github.com/apache/spark/commit/c9321df909a1cb8307d2dc4056e7e0146822053c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80777/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan . I added `SparkOrcNewRecordReader.java` back to reduce the patch size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    This PR is about 1,100 lines and #17980 is about 3,833.
    I also updated #17980 today, too. If you want to review that PR, that is also great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81142/testReport)** for PR 18953 at commit [`b9b348d`](https://github.com/apache/spark/commit/b9b348de40bab16fd43d033f7191e6ee868246af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80707/testReport)** for PR 18953 at commit [`051ed1f`](https://github.com/apache/spark/commit/051ed1fd86ee1354d1e650b1cf51a41db2d83619).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80722/testReport)** for PR 18953 at commit [`07778ed`](https://github.com/apache/spark/commit/07778ed449bbf7ce2f1b5e8258e6ef58475b289c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80980/testReport)** for PR 18953 at commit [`3d602ab`](https://github.com/apache/spark/commit/3d602ab85b2ac0b42e3f66078d2261d62d031867).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18953: [SPARK-20682][SQL] Update ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18953#discussion_r134068925
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
    @@ -205,38 +220,53 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest {
           spark.range(0, 10).write
             .option("compression", "ZLIB")
             .orc(file.getCanonicalPath)
    +      val maybeOrcFile = file.listFiles().find(_.getName.endsWith(".zlib.orc"))
    +      assert(maybeOrcFile.isDefined)
    +      val orcFilePath = new Path(maybeOrcFile.get.getAbsolutePath)
    +      val conf = OrcFile.readerOptions(new Configuration())
           val expectedCompressionKind =
    -        OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
    +        OrcFile.createReader(orcFilePath, conf).getCompressionKind
           assert("ZLIB" === expectedCompressionKind.name())
         }
     
         withTempPath { file =>
           spark.range(0, 10).write
             .option("compression", "SNAPPY")
             .orc(file.getCanonicalPath)
    +      val maybeOrcFile = file.listFiles().find(_.getName.endsWith(".snappy.orc"))
    +      assert(maybeOrcFile.isDefined)
    +      val orcFilePath = new Path(maybeOrcFile.get.getAbsolutePath)
    +      val conf = OrcFile.readerOptions(new Configuration())
           val expectedCompressionKind =
    -        OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
    +        OrcFile.createReader(orcFilePath, conf).getCompressionKind
           assert("SNAPPY" === expectedCompressionKind.name())
         }
     
         withTempPath { file =>
           spark.range(0, 10).write
             .option("compression", "NONE")
             .orc(file.getCanonicalPath)
    +      val maybeOrcFile = file.listFiles().find(_.getName.endsWith(".orc"))
    +      assert(maybeOrcFile.isDefined)
    +      val orcFilePath = new Path(maybeOrcFile.get.getAbsolutePath)
    +      val conf = OrcFile.readerOptions(new Configuration())
           val expectedCompressionKind =
    -        OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
    +        OrcFile.createReader(orcFilePath, conf).getCompressionKind
           assert("NONE" === expectedCompressionKind.name())
         }
       }
     
    -  // Following codec is not supported in Hive 1.2.1, ignore it now
    -  ignore("LZO compression options for writing to an ORC file not supported in Hive 1.2.1") {
    --- End diff --
    
    This is a known improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Now, it becomes `+432 −98`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80858/testReport)** for PR 18953 at commit [`8548b73`](https://github.com/apache/spark/commit/8548b73d971ef5751594f5204aea83a3ead8bd4b).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan .
    In my email, I wrote in the following order .
    
        1. SPARK-21422: Depend on Apache ORC 1.4.0
        2. SPARK-20682: Add a new faster ORC data source based on Apache ORC
        3. SPARK-20728: Make ORCFileFormat configurable between sql/hive and sql/core
        4. SPARK-16060: Vectorized Orc Reader
    
    In Apache Spark 2.3, I thought we need to keep both by option `spark.sql.orc.enabled`.
    
    Do you mean `removing old orc data source` from `sql/hive`?
    
    In this PR, I replaces `sql/hive` ORC to reduce the burden of review of test code. The new test code is #17980.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81013/testReport)** for PR 18953 at commit [`8507aef`](https://github.com/apache/spark/commit/8507aefbb976594e89bba554ea5beed77c49390c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80832/testReport)** for PR 18953 at commit [`f8de872`](https://github.com/apache/spark/commit/f8de872106d67239581f495e5df60fe9a6d44257).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, All.
    Although ORC seems not to be a prefered storage format in Apache Spark, ORC is very important to me. Could anyone review this again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan and @gatorsmile .
    Could you review this PR when you have sometime?
    If we need more refactoring or spin-off, please let me know.
    Thank you always.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80869/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80707/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81326/testReport)** for PR 18953 at commit [`6548cf8`](https://github.com/apache/spark/commit/6548cf877cf71eccf7cc6c4e14072b7d478c4e74).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81597/testReport)** for PR 18953 at commit [`ed43eb7`](https://github.com/apache/spark/commit/ed43eb7fb47a1c65875bbf97a8e108abfb115925).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80875/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80832/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80858/testReport)** for PR 18953 at commit [`8548b73`](https://github.com/apache/spark/commit/8548b73d971ef5751594f5204aea83a3ead8bd4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81148/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80858/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    I updated the PR description, @cloud-fan .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80869/testReport)** for PR 18953 at commit [`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    The Jenkins fails twice with different R test suite.s Those errors are irrelevant.
    
    - test_mllib_tree.R (Test build #80875)
    ```
    1. Error: spark.gbt (@test_mllib_tree.R#120) -----------------------------------
    java.lang.IllegalArgumentException: requirement failed: The input column stridx_f91a4b086385 should have at least two distinct values.
    ```
    
    - test_mllib_classification.R (Test build #80877)
    ```
    1. Error: spark.svmLinear (@test_mllib_classification.R#80) --------------------
    java.lang.IllegalArgumentException: requirement failed: LinearSVC only supports binary classification. 1 classes detected in linearsvc_ec743395c7af__labelCol
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Are the ORC APIs changed a lot in 1.4? I was expecting a small patch to upgrade the current ORC data source, without moving it to sql/core.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    @cloud-fan . The PR is updated. Now, it's minimized as +493 and −247 lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    The goal is using ORC with `-Phive`. You can build Spark and use ORC datasource.
    
    Previously, `org.apache.spark.sql.hive.orc.ORCFileFormat` is tightly coupled with Hive code outside `org.apache.spark.sql.hive.orc.` package. For example, `org.apache.spark.sql.hive.HiveInspectors`. Also, it uses the following imports.
    ```
    import org.apache.hadoop.hive.conf.HiveConf.ConfVars
    import org.apache.hadoop.hive.ql.io.orc._
    import org.apache.hadoop.hive.serde2.objectinspector.{SettableStructObjectInspector, StructObjectInspector}
    import org.apache.hadoop.hive.serde2.typeinfo.{StructTypeInfo, TypeInfoUtils}
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81513/testReport)** for PR 18953 at commit [`014f2f3`](https://github.com/apache/spark/commit/014f2f3139fb1ea5efe5930cdd4a2e7d64172a94).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81268/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81142/testReport)** for PR 18953 at commit [`b9b348d`](https://github.com/apache/spark/commit/b9b348de40bab16fd43d033f7191e6ee868246af).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80875/testReport)** for PR 18953 at commit [`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80909/testReport)** for PR 18953 at commit [`63cf876`](https://github.com/apache/spark/commit/63cf87688ae1b47e6adcad4d9ff1784ac321eb12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan , @gatorsmile , @rxin , @sameeragarwal , and @viirya .
    Could you review this ORC PR? I narrow down the focus and reduce the size of PR.
    For review purpose, I replace the old ORC with new ORC.
    Thank you always!
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan and @gatorsmile .
    I know that you have been spending much time for reviewing my PRs (including this).
    Thank you always. If you have something in mind, please let me know. I'll try to improve it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81012/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80771/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    This is resolved via https://github.com/apache/spark/pull/19651 .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81513/testReport)** for PR 18953 at commit [`014f2f3`](https://github.com/apache/spark/commit/014f2f3139fb1ea5efe5930cdd4a2e7d64172a94).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80827/testReport)** for PR 18953 at commit [`f8de872`](https://github.com/apache/spark/commit/f8de872106d67239581f495e5df60fe9a6d44257).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan , @gatorsmile , @rxin , @sameeragarwal , @hvanhovell , @mridulm and @viirya .
    Could you give me some opinion on this ORC PR when you have sometime?
    According to @cloud-fan 's advice, I'm trying to replace the old Hive ORC here. It becomes much smaller than before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80827/testReport)** for PR 18953 at commit [`f8de872`](https://github.com/apache/spark/commit/f8de872106d67239581f495e5df60fe9a6d44257).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81148/testReport)** for PR 18953 at commit [`6548cf8`](https://github.com/apache/spark/commit/6548cf877cf71eccf7cc6c4e14072b7d478c4e74).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18953: [SPARK-20682][SQL] Implement new ORC data source ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18953#discussion_r133368613
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -47,11 +47,11 @@ import org.apache.spark.util.SerializableConfiguration
      * `FileFormat` for reading ORC files. If this is moved or renamed, please update
      * `DataSource`'s backwardCompatibilityMap.
      */
    -class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable {
    +class OrcFileFormatOld extends FileFormat with DataSourceRegister with Serializable {
    --- End diff --
    
    This change of name will be reverted after review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81148/testReport)** for PR 18953 at commit [`6548cf8`](https://github.com/apache/spark/commit/6548cf877cf71eccf7cc6c4e14072b7d478c4e74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80710/testReport)** for PR 18953 at commit [`22dbe35`](https://github.com/apache/spark/commit/22dbe358041605d6afc9d510f29802ce1c0fb7b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81268/testReport)** for PR 18953 at commit [`6548cf8`](https://github.com/apache/spark/commit/6548cf877cf71eccf7cc6c4e14072b7d478c4e74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    So far, the current ORC related code looks too old and tightly integrated with `hive-exec-1.2.1.spark2.jar` and `hive` module side-by-side.
    The patch also need to touch every part because everything is changed; Especially, `OrcInputFormat.createReader (in hive-exec)`, `Filter`, `SearchArgument`, `HiveInspectors`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80777/testReport)** for PR 18953 at commit [`80c80f3`](https://github.com/apache/spark/commit/80c80f34eb4dfb7c94d7875438effab52c71575d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81513/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80881/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81569/testReport)** for PR 18953 at commit [`014f2f3`](https://github.com/apache/spark/commit/014f2f3139fb1ea5efe5930cdd4a2e7d64172a94).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81012/testReport)** for PR 18953 at commit [`263b3dc`](https://github.com/apache/spark/commit/263b3dc3ca3e6df6107cd70bb8cebd230c0e937d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80861/testReport)** for PR 18953 at commit [`8548b73`](https://github.com/apache/spark/commit/8548b73d971ef5751594f5204aea83a3ead8bd4b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan .
    Could you review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    In these days, `fails due to an unknown error code, -9.` seems to become more frequent.
    
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18953: [SPARK-20682][SQL] Implement new ORC data source ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18953#discussion_r133368561
  
    --- Diff: sql/hive/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister ---
    @@ -1,2 +1,2 @@
    -org.apache.spark.sql.hive.orc.OrcFileFormat
    +org.apache.spark.sql.hive.orc.OrcFileFormatOld
    --- End diff --
    
    This will be reverted after review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81569/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81013/testReport)** for PR 18953 at commit [`8507aef`](https://github.com/apache/spark/commit/8507aefbb976594e89bba554ea5beed77c49390c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81569/testReport)** for PR 18953 at commit [`014f2f3`](https://github.com/apache/spark/commit/014f2f3139fb1ea5efe5930cdd4a2e7d64172a94).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80881/testReport)** for PR 18953 at commit [`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80777/testReport)** for PR 18953 at commit [`80c80f3`](https://github.com/apache/spark/commit/80c80f34eb4dfb7c94d7875438effab52c71575d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80861/testReport)** for PR 18953 at commit [`8548b73`](https://github.com/apache/spark/commit/8548b73d971ef5751594f5204aea83a3ead8bd4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80980/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80875/testReport)** for PR 18953 at commit [`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81013/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81142/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80881/testReport)** for PR 18953 at commit [`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80909/testReport)** for PR 18953 at commit [`63cf876`](https://github.com/apache/spark/commit/63cf87688ae1b47e6adcad4d9ff1784ac321eb12).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80722/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81326/testReport)** for PR 18953 at commit [`6548cf8`](https://github.com/apache/spark/commit/6548cf877cf71eccf7cc6c4e14072b7d478c4e74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80707/testReport)** for PR 18953 at commit [`051ed1f`](https://github.com/apache/spark/commit/051ed1fd86ee1354d1e650b1cf51a41db2d83619).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class OrcFileFormatOld extends FileFormat with DataSourceRegister with Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80909/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80869/testReport)** for PR 18953 at commit [`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Rebased to the master since #18640 is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    In case of `OrcFilters.scala`, the API is changed like the following.
    ```
    - Some(builder.startAnd().isNull(attribute).end())
    + Some(builder.startAnd().isNull(attribute, getType(attribute)).end())
    ```
    
    You can see more diff by `diff sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80721 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80721/testReport)** for PR 18953 at commit [`07778ed`](https://github.com/apache/spark/commit/07778ed449bbf7ce2f1b5e8258e6ef58475b289c).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80710/testReport)** for PR 18953 at commit [`22dbe35`](https://github.com/apache/spark/commit/22dbe358041605d6afc9d510f29802ce1c0fb7b3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80840/testReport)** for PR 18953 at commit [`c9321df`](https://github.com/apache/spark/commit/c9321df909a1cb8307d2dc4056e7e0146822053c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80861/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18953: [SPARK-20682][SQL] Implement new ORC data source ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18953#discussion_r133368809
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
    @@ -343,7 +343,7 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest {
         }
       }
     
    -  test("SPARK-8501: Avoids discovery schema from empty ORC files") {
    +  ignore("SPARK-8501: Avoids discovery schema from empty ORC files") {
    --- End diff --
    
    This only happens on old Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan , @gatorsmile , @sameeragarwal , @rxin , @viirya .
    Could you review this ORC PR again? According to the advice, I'm replacing the existing ORC inside `sql/hive`. We can move this later into `sql/core` and can remove unused ORC related code later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #80771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80771/testReport)** for PR 18953 at commit [`80c80f3`](https://github.com/apache/spark/commit/80c80f34eb4dfb7c94d7875438effab52c71575d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Hi, @cloud-fan .
    The PR is ready for review again. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18953: [SPARK-20682][SQL] Update ORC data source based o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun closed the pull request at:

    https://github.com/apache/spark/pull/18953


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18953
  
    **[Test build #81268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81268/testReport)** for PR 18953 at commit [`6548cf8`](https://github.com/apache/spark/commit/6548cf877cf71eccf7cc6c4e14072b7d478c4e74).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18953: [SPARK-20682][SQL] Update ORC data source based o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18953#discussion_r134398990
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcUtils.scala ---
    @@ -0,0 +1,288 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive.orc
    +
    +import java.io.IOException
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.{FileSystem, Path}
    +import org.apache.hadoop.io._
    +import org.apache.orc.{OrcFile, TypeDescription}
    +import org.apache.orc.mapred.{OrcList, OrcMap, OrcStruct, OrcTimestamp}
    +import org.apache.orc.storage.common.`type`.HiveDecimal
    +import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable}
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
    +import org.apache.spark.sql.catalyst.util._
    +import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
    +
    +object OrcUtils {
    +  /**
    +   * Read ORC file schema. This method is used in `inferSchema`.
    +   */
    +  private[orc] def readSchema(file: Path, conf: Configuration): Option[TypeDescription] = {
    +    try {
    +      val options = OrcFile.readerOptions(conf).filesystem(FileSystem.get(conf))
    +      val reader = OrcFile.createReader(file, options)
    +      val schema = reader.getSchema
    +      if (schema.getFieldNames.isEmpty) {
    +        None
    +      } else {
    +        Some(schema)
    +      }
    +    } catch {
    +      case _: IOException => None
    +    }
    +  }
    +
    +  /**
    +   * Return ORC schema with schema field name correction and the total number of rows.
    +   */
    +  private[orc] def getSchemaAndNumberOfRows(
    +      dataSchema: StructType,
    +      filePath: String,
    +      conf: Configuration) = {
    +    val hdfsPath = new Path(filePath)
    +    val fs = hdfsPath.getFileSystem(conf)
    +    val reader = OrcFile.createReader(hdfsPath, OrcFile.readerOptions(conf).filesystem(fs))
    +    val rawSchema = reader.getSchema
    +    val orcSchema = if (!rawSchema.getFieldNames.isEmpty &&
    +        rawSchema.getFieldNames.asScala.forall(_.startsWith("_col"))) {
    +      var schemaString = rawSchema.toString
    +      dataSchema.zipWithIndex.foreach { case (field: StructField, index: Int) =>
    +        schemaString = schemaString.replace(s"_col$index:", s"${field.name}:")
    +      }
    +      TypeDescription.fromString(schemaString)
    +    } else {
    +      rawSchema
    +    }
    +    (orcSchema, reader.getNumberOfRows)
    +  }
    +
    +  /**
    +   * Return a ORC schema string for ORCStruct.
    +   */
    +  private[orc] def getSchemaString(schema: StructType): String = {
    +    schema.fields.map(f => s"${f.name}:${f.dataType.catalogString}").mkString("struct<", ",", ">")
    +  }
    +
    +  private[orc] def getTypeDescription(dataType: DataType) = dataType match {
    +    case st: StructType => TypeDescription.fromString(getSchemaString(st))
    +    case _ => TypeDescription.fromString(dataType.catalogString)
    +  }
    +
    +  /**
    +   * Return a Orc value object for the given Spark schema.
    +   */
    +  private[orc] def createOrcValue(dataType: DataType) =
    +    OrcStruct.createValue(getTypeDescription(dataType))
    +
    +  /**
    +   * Convert Apache ORC OrcStruct to Apache Spark InternalRow.
    +   * If internalRow is not None, fill into it. Otherwise, create a SpecificInternalRow and use it.
    +   */
    +  private[orc] def convertOrcStructToInternalRow(
    +      orcStruct: OrcStruct,
    +      schema: StructType,
    +      internalRow: Option[InternalRow] = None): InternalRow = {
    +    val mutableRow = internalRow.getOrElse(new SpecificInternalRow(schema.map(_.dataType)))
    +
    +    for (schemaIndex <- 0 until schema.length) {
    +      val writable = orcStruct.getFieldValue(schema(schemaIndex).name)
    +      if (writable == null) {
    +        mutableRow.setNullAt(schemaIndex)
    +      } else {
    +        mutableRow(schemaIndex) = getCatalystValue(writable, schema(schemaIndex).dataType)
    +      }
    +    }
    +
    +    mutableRow
    +  }
    +
    +  /**
    +   * Convert Apache Spark InternalRow to Apache ORC OrcStruct.
    +   */
    +  private[orc] def convertInternalRowToOrcStruct(
    +      row: InternalRow,
    +      schema: StructType,
    +      struct: Option[OrcStruct] = None): OrcStruct = {
    +    val orcStruct = struct.getOrElse(createOrcValue(schema).asInstanceOf[OrcStruct])
    +
    +    for (schemaIndex <- 0 until schema.length) {
    +      val fieldType = schema(schemaIndex).dataType
    +      val fieldValue = if (row.isNullAt(schemaIndex)) {
    +        null
    +      } else {
    +        getWritable(row.get(schemaIndex, fieldType), fieldType)
    +      }
    +      orcStruct.setFieldValue(schemaIndex, fieldValue)
    +    }
    +    orcStruct
    +  }
    +
    +  /**
    +   * Return WritableComparable from Spark catalyst values.
    +   */
    +  private[orc] def getWritable(value: Object, dataType: DataType): WritableComparable[_] = {
    +    if (value == null) {
    +      null
    +    } else {
    +      dataType match {
    +        case NullType => null
    +
    +        case BooleanType => new BooleanWritable(value.asInstanceOf[Boolean])
    +
    +        case ByteType => new ByteWritable(value.asInstanceOf[Byte])
    +        case ShortType => new ShortWritable(value.asInstanceOf[Short])
    +        case IntegerType => new IntWritable(value.asInstanceOf[Int])
    +        case LongType => new LongWritable(value.asInstanceOf[Long])
    +
    +        case FloatType => new FloatWritable(value.asInstanceOf[Float])
    +        case DoubleType => new DoubleWritable(value.asInstanceOf[Double])
    +
    +        case StringType => new Text(value.asInstanceOf[UTF8String].getBytes)
    +
    +        case BinaryType => new BytesWritable(value.asInstanceOf[Array[Byte]])
    +
    +        case DateType => new DateWritable(DateTimeUtils.toJavaDate(value.asInstanceOf[Int]))
    +
    +        case TimestampType =>
    +          val us = value.asInstanceOf[Long]
    +          var seconds = us / DateTimeUtils.MICROS_PER_SECOND
    +          var micros = us % DateTimeUtils.MICROS_PER_SECOND
    +          if (micros < 0) {
    +            micros += DateTimeUtils.MICROS_PER_SECOND
    +            seconds -= 1
    +          }
    +          val t = new OrcTimestamp(seconds * 1000)
    +          t.setNanos(micros.toInt * 1000)
    +          t
    +
    +        case _: DecimalType =>
    +          new HiveDecimalWritable(HiveDecimal.create(value.asInstanceOf[Decimal].toJavaBigDecimal))
    +
    +        case st: StructType =>
    +          convertInternalRowToOrcStruct(value.asInstanceOf[InternalRow], st)
    +
    +        case ArrayType(et, _) =>
    +          val data = value.asInstanceOf[ArrayData]
    +          val list = createOrcValue(dataType)
    +          for (i <- 0 until data.numElements()) {
    +            list.asInstanceOf[OrcList[WritableComparable[_]]]
    +              .add(getWritable(data.get(i, et), et))
    +          }
    +          list
    +
    +        case MapType(keyType, valueType, _) =>
    +          val data = value.asInstanceOf[MapData]
    +          val map = createOrcValue(dataType)
    +            .asInstanceOf[OrcMap[WritableComparable[_], WritableComparable[_]]]
    +          data.foreach(keyType, valueType, { case (k, v) =>
    +            map.put(
    +              getWritable(k.asInstanceOf[Object], keyType),
    +              getWritable(v.asInstanceOf[Object], valueType))
    +          })
    +          map
    +
    +        case udt: UserDefinedType[_] =>
    +          val udtRow = new SpecificInternalRow(Seq(udt.sqlType))
    +          udtRow(0) = value
    +          convertInternalRowToOrcStruct(udtRow,
    +            StructType(Seq(StructField("tmp", udt.sqlType)))).getFieldValue(0)
    +
    +        case _ =>
    +          throw new UnsupportedOperationException(s"$dataType is not supported yet.")
    +      }
    +    }
    +
    +  }
    +
    +  /**
    +   * Return Spark Catalyst value from WritableComparable object.
    +   */
    +  private[orc] def getCatalystValue(value: WritableComparable[_], dataType: DataType): Any = {
    --- End diff --
    
    we'd better return a function to avoid per-row pattern matche. cc @HyukjinKwon who fixed similar problems many times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org