You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/12/09 14:02:00 UTC

[jira] [Commented] (HADOOP-18521) ABFS ReadBufferManager buffer sharing across concurrent HTTP requests

    [ https://issues.apache.org/jira/browse/HADOOP-18521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645309#comment-17645309 ] 

Steve Loughran commented on HADOOP-18521:
-----------------------------------------

We have a nice and minimal fix which can be easily backported anywhere that is needed.

That larger patch of mine was intended to
# avoid adding the prefetches of closed streams to the completed list (impact: buffers will be retained until timeout)
# add validation about accidental buffer reuse
# handle failures in the async read caused by the close()
# iostatistics for production code and testing. used in asserts and will allow us to assess value of prefetching in production code
# pulling out of the methods invoked on abfs input stream into their own interface, again for testing.

we don't need this as much any more; it's something where I would like most of the features (1, 3, 4, 5) in, but we could look at that in terms of a broader review of the readbuffer feature

I'm going to
rebase my pr
create a new jira for "extend readbuffer for testing/statistics" and move the pr to that

> ABFS ReadBufferManager buffer sharing across concurrent HTTP requests
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-18521
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18521
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 3.3.2, 3.3.3, 3.3.4
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>              Labels: pull-request-available
>
> AbfsInputStream.close() can trigger the return of buffers used for active prefetch GET requests into the ReadBufferManager free buffer pool.
> A subsequent prefetch by a different stream in the same process may acquire this same buffer. This can lead to risk of corruption of its own prefetched data, data which may then be returned to that other thread.
> On releases without the fix for this (3.3.2+), the bug can be avoided by disabling all prefetching 
> {code}
> fs.azure.readaheadqueue.depth = 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org