You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Serhii (Jira)" <ji...@apache.org> on 2020/05/27 07:25:00 UTC

[jira] [Commented] (SPARK-31718) DataSourceV2 unexpected behavior with partition data distribution

    [ https://issues.apache.org/jira/browse/SPARK-31718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117463#comment-17117463 ] 

Serhii commented on SPARK-31718:
--------------------------------

Hi Team,

Is any chance to have any update on issue?

>  DataSourceV2 unexpected behavior with partition data distribution
> ------------------------------------------------------------------
>
>                 Key: SPARK-31718
>                 URL: https://issues.apache.org/jira/browse/SPARK-31718
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.4.0
>            Reporter: Serhii
>            Priority: Major
>
> Hi team,
>   
>  We are using DataSourceV2.
>   
>  We have a queston regarding using interface org.apache.spark.sql.sources.v2.writer.DataWriter<T>
>   
>  We have faced with following unexpected behavior.
>  When we use a repartion on dataframe we expect that for each partion Spark will create new instance of DataWriter interface and sends the repartition data to appropriate instances but sometimes we observe that Spark sends the data from different partitions to the same instance of DataWriter interface.
>  It behavior sometimes occures on Yarn cluster.
>   
>  If we run Spark job as Local run Spark really creates a new instance of DataWriter interface for each partiion after repartion and publishes the repartion data to appropriate instances.
>   
> Possible there is a Spark limit a number of  DataWriter instances?
>  Can you explain it is a bug or expected behavior?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org