You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (Jira)" <ji...@apache.org> on 2021/06/15 19:31:00 UTC
[jira] [Commented] (PIG-5319) Investigate why TestStoreInstances
fails with Spark 2.2
[ https://issues.apache.org/jira/browse/PIG-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363875#comment-17363875 ]
Koji Noguchi commented on PIG-5319:
-----------------------------------
I do see OutputFormat created twice (*** below)
Using Spark-2.4
{code:java|title=SparkHadoopWriter.scala}
117 committer.setupTask(taskContext). ***
118
119 // Initiate the writer.
120 config.initWriter(taskContext, sparkPartitionId) ***
{code}
Within setupTask and initWriter, each is creating a separate OutputFormat.
Trace for each.
{noformat}
SparkHadoopWriter.scala:117 committer.setupTask(taskContext)
--> HadoopMapReduceCommitProtocol.scala:217 setupCommitter(taskContext)
--> --> HadoopMapReduceCommitProtocol.scala:94 val format = context.getOutputFormatClass.newInstance()
{noformat}
and
{noformat}
SparkHadoopWriter.scala:120 config.initWriter(taskContext, sparkPartitionId)
--> SparkHadoopWriter.scala:343 val taskFormat = getOutputFormat()
--> --> SparkHadoopWriter.scala:384 outputFormat.newInstance()
{noformat}
> Investigate why TestStoreInstances fails with Spark 2.2
> -------------------------------------------------------
>
> Key: PIG-5319
> URL: https://issues.apache.org/jira/browse/PIG-5319
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Nándor Kollár
> Priority: Major
>
> TestStoreInstances unit test fails with Spark 2.2.x. It seems in job and task commit logic changed a lot since Spark 2.1.x, now it looks like Spark uses a different PigOutputFormat when writing to files, and a different one when getting the OutputCommitters
--
This message was sent by Atlassian Jira
(v8.3.4#803005)