You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Erik Krogen (Jira)" <ji...@apache.org> on 2021/09/20 17:59:00 UTC

[jira] [Updated] (SPARK-36810) Handle HDSF read inconsistencies on Spark when observer Namenode is used

     [ https://issues.apache.org/jira/browse/SPARK-36810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Krogen updated SPARK-36810:
--------------------------------
    Description: 
In short, with HDFS HA and with the use of [Observer Namenode|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html] the read-after-write consistency is only available when both the write and the read happens from the same client.

But if the write happens on executor and the read happens on the driver, then the reads would be inconsistent causing application failure issues. This can be fixed by calling `FileSystem.msync` before making any read calls where the client thinks the write could have possibly happened elsewhere.

This issue is discussed in greater detail in this [discussion|https://mail-archives.apache.org/mod_mbox/spark-dev/202108.mbox/browser] 

  was:
In short, with HDFS HA and with the use of [Observer Namenode|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html],] the read-after-write consistency is only available when both the write and the read happens from the same client.

But if the write happens on executor and the read happens on the driver, then the reads would be inconsistent causing application failure issues. This can be fixed by calling `FileSystem.msync` before making any read calls where the client thinks the write could have possibly happened elsewhere.

This issue is discussed in greater detail in this [discussion|https://mail-archives.apache.org/mod_mbox/spark-dev/202108.mbox/browser] 


> Handle HDSF read inconsistencies on Spark when observer Namenode is used
> ------------------------------------------------------------------------
>
>                 Key: SPARK-36810
>                 URL: https://issues.apache.org/jira/browse/SPARK-36810
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 3.2.0
>            Reporter: Venkata krishnan Sowrirajan
>            Priority: Major
>
> In short, with HDFS HA and with the use of [Observer Namenode|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html] the read-after-write consistency is only available when both the write and the read happens from the same client.
> But if the write happens on executor and the read happens on the driver, then the reads would be inconsistent causing application failure issues. This can be fixed by calling `FileSystem.msync` before making any read calls where the client thinks the write could have possibly happened elsewhere.
> This issue is discussed in greater detail in this [discussion|https://mail-archives.apache.org/mod_mbox/spark-dev/202108.mbox/browser] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org