You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhan Zhang (JIRA)" <ji...@apache.org> on 2015/04/02 21:54:54 UTC
[jira] [Comment Edited] (SPARK-2883) Spark Support for ORCFile format

    [ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393141#comment-14393141 ] 

Zhan Zhang edited comment on SPARK-2883 at 4/2/15 7:54 PM:
-----------------------------------------------------------

Following code demonstrate the usage of the orc support.
@climberus following examples demonstrate how to use it:

import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._
//schema
case class AllDataTypes(
    stringField: String,
    intField: Int,
    longField: Long,
    floatField: Float,
    doubleField: Double,
    shortField: Short,
    byteField: Byte,
    booleanField: Boolean)
//saveAsOrcFile
val range = (0 to 255)
val data = sc.parallelize(range).map(x => AllDataTypes(s"$x", x, x.toLong, x.toFloat, x.toDouble, x.toShort, x.toByte, x % 2 == 0))
 data.toDF().saveAsOrcFile("orcTest")
//read orcFile
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
//orcFile
val orcTest = hiveContext.orcFile("orcTest")
orcTest.registerTempTable("orcTest")
hiveContext.sql("SELECT * from orcTest where intfield>185").collect.foreach(println)
//new data source API, read
  hiveContext.sql("create temporary table orc using org.apache.spark.sql.hive.orc OPTIONS (path \"orcTest\")")
  hiveContext.sql("select * from orc").collect.foreach(println)
val table = hiveContext.sql("select * from orc")
// new data source API write
table.saveAsTable("table", "org.apache.spark.sql.hive.orc")
val hiveOrc = hiveContext.orcFile("/user/hive/warehouse/table")
hiveOrc.registerTempTable("hiveOrc")
hiveContext.sql("select * from hiveOrc").collect.foreach(println)
table.saveAsOrcFile("/user/ambari-qa/table")
hiveContext.sql("create temporary table normal_orc_as_source USING org.apache.spark.sql.hive.orc OPTIONS (path 'saveTable') as select * from table")


was (Author: zzhan):
Following code demonstrate the usage of the orc support.



import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._
//saveAsOrcFile
case class AllDataTypes(
    stringField: String,
    intField: Int,
    longField: Long,
    floatField: Float,
    doubleField: Double,
    shortField: Short,
    byteField: Byte,
    booleanField: Boolean)

    val range = (0 to 255)
    val data = sc.parallelize(range).map(x => AllDataTypes(s"$x", x, x.toLong, x.toFloat, x.toDouble, x.toShort, x.toByte, x % 2 == 0))
    data.toDF().saveAsOrcFile("orcTest")
//read orcFile
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val orcTest = hiveContext.orcFile("orcTest")
orcTest.registerTempTable("orcTest")
hiveContext.sql("SELECT * from orcTest where intfield>185").collect.foreach(println)

  hiveContext.sql("create temporary table orc using org.apache.spark.sql.hive.orc OPTIONS (path \"orcTest\")")
  hiveContext.sql("select * from orc").collect.foreach(println)
val table = hiveContext.sql("select * from orc")
table.saveAsTable("table", "org.apache.spark.sql.hive.orc")
val hiveOrc = hiveContext.orcFile("/user/hive/warehouse/table")
hiveOrc.registerTempTable("hiveOrc")
hiveContext.sql("select * from hiveOrc").collect.foreach(println)
table.saveAsOrcFile("/user/ambari-qa/table")
hiveContext.sql("create temporary table normal_orc_as_source USING org.apache.spark.sql.hive.orc OPTIONS (path 'saveTable') as select * from table")

> Spark Support for ORCFile format
> --------------------------------
>
>                 Key: SPARK-2883
>                 URL: https://issues.apache.org/jira/browse/SPARK-2883
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output, SQL
>            Reporter: Zhan Zhang
>            Priority: Blocker
>         Attachments: 2014-09-12 07.05.24 pm Spark UI.png, 2014-09-12 07.07.19 pm jobtracker.png, orc.diff
>
>
> Verify the support of OrcInputFormat in spark, fix issues if exists and add documentation of its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org