You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Koji Noguchi (Jira)" <ji...@apache.org> on 2021/06/15 19:31:00 UTC

[jira] [Commented] (PIG-5319) Investigate why TestStoreInstances fails with Spark 2.2

    [ https://issues.apache.org/jira/browse/PIG-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363875#comment-17363875 ] 

Koji Noguchi commented on PIG-5319:
-----------------------------------

I do see OutputFormat created twice (*** below)
 Using Spark-2.4
{code:java|title=SparkHadoopWriter.scala}
117     committer.setupTask(taskContext).  ***
118
119     // Initiate the writer.
120     config.initWriter(taskContext, sparkPartitionId) ***
{code}
Within setupTask and initWriter, each is creating a separate OutputFormat.

Trace for each.
{noformat}
SparkHadoopWriter.scala:117     committer.setupTask(taskContext)
--> HadoopMapReduceCommitProtocol.scala:217 setupCommitter(taskContext)
-->   --> HadoopMapReduceCommitProtocol.scala:94     val format = context.getOutputFormatClass.newInstance() 
{noformat}
and
{noformat}
SparkHadoopWriter.scala:120     config.initWriter(taskContext, sparkPartitionId)
--> SparkHadoopWriter.scala:343     val taskFormat = getOutputFormat()
--> --> SparkHadoopWriter.scala:384     outputFormat.newInstance()
{noformat}
 

> Investigate why TestStoreInstances fails with Spark 2.2
> -------------------------------------------------------
>
>                 Key: PIG-5319
>                 URL: https://issues.apache.org/jira/browse/PIG-5319
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Nándor Kollár
>            Priority: Major
>
> TestStoreInstances unit test fails with Spark 2.2.x. It seems in job and task commit logic changed a lot since Spark 2.1.x, now it looks like Spark uses a different PigOutputFormat when writing to files, and a different one when getting the OutputCommitters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)