You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "swetha k (JIRA)" <ji...@apache.org> on 2015/12/01 22:51:10 UTC
[jira] [Comment Edited] (SPARK-11620)
parquet.hadoop.ParquetOutputCommitter.commitJob() throws
parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034650#comment-15034650 ]
swetha k edited comment on SPARK-11620 at 12/1/15 9:50 PM:
-----------------------------------------------------------
[~hyukjin.kwon]
I have the following code that saves the parquet files in my hourly batch to
hdfs and the code is based on the github link in the end. And the WARNING message that I get is as shown in the previous comments. Any idea as to why this is happening?
val job = Job.getInstance()
var filePath = "path"
val metricsPath: Path = new Path(filePath)
//Check if inputFile exists
val fs: FileSystem = FileSystem.get(job.getConfiguration)
if (fs.exists(metricsPath)) {
fs.delete(metricsPath, true)
}
// Configure the ParquetOutputFormat to use Avro as the
serialization format
ParquetOutputFormat.setWriteSupportClass(job,
classOf[AvroWriteSupport])
// You need to pass the schema to AvroParquet when you are writing
objects but not when you
// are reading them. The schema is saved in Parquet file for future
readers to use.
AvroParquetOutputFormat.setSchema(job, Metrics.SCHEMA$)
// Create a PairRDD with all keys set to null and wrap each Metrics
in serializable objects
val metricsToBeSaved = metrics.map(metricRecord => (null, new
SerializableMetrics(new Metrics(metricRecord._1, metricRecord._2._1,
metricRecord._2._2))));
metricsToBeSaved.coalesce(1500)
// Save the RDD to a Parquet file in our temporary output directory
metricsToBeSaved.saveAsNewAPIHadoopFile(filePath, classOf[Void],
classOf[Metrics],
classOf[ParquetOutputFormat[Metrics]], job.getConfiguration)
https://github.com/massie/spark-parquet-example/blob/master/src/main/scala/com/zenfractal/SparkParquetExample.scala
was (Author: swethakasireddy):
[~hyukjin.kwon]
I have the following code that saves the parquet files in my hourly batch to
hdfs and the code is based on the github link in the end.
val job = Job.getInstance()
var filePath = "path"
val metricsPath: Path = new Path(filePath)
//Check if inputFile exists
val fs: FileSystem = FileSystem.get(job.getConfiguration)
if (fs.exists(metricsPath)) {
fs.delete(metricsPath, true)
}
// Configure the ParquetOutputFormat to use Avro as the
serialization format
ParquetOutputFormat.setWriteSupportClass(job,
classOf[AvroWriteSupport])
// You need to pass the schema to AvroParquet when you are writing
objects but not when you
// are reading them. The schema is saved in Parquet file for future
readers to use.
AvroParquetOutputFormat.setSchema(job, Metrics.SCHEMA$)
// Create a PairRDD with all keys set to null and wrap each Metrics
in serializable objects
val metricsToBeSaved = metrics.map(metricRecord => (null, new
SerializableMetrics(new Metrics(metricRecord._1, metricRecord._2._1,
metricRecord._2._2))));
metricsToBeSaved.coalesce(1500)
// Save the RDD to a Parquet file in our temporary output directory
metricsToBeSaved.saveAsNewAPIHadoopFile(filePath, classOf[Void],
classOf[Metrics],
classOf[ParquetOutputFormat[Metrics]], job.getConfiguration)
https://github.com/massie/spark-parquet-example/blob/master/src/main/scala/com/zenfractal/SparkParquetExample.scala
> parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: swetha k
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org