You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/06/25 14:30:00 UTC

[jira] [Commented] (HADOOP-15557) CryptoInputStream can't handle concurrent access; inconsistent with HDFS

    [ https://issues.apache.org/jira/browse/HADOOP-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522341#comment-16522341 ] 

Steve Loughran commented on HADOOP-15557:
-----------------------------------------

changed title to remove the "when misused" point. I don't think HBase's concurrent use of the Hadoop FS APIs was a wilful decision, just something the authors did which *works on hdfs*. We have to consider it a requirement that if your FS wants to support HDFS, then they have to support concurrent IO, even if it's through some concurrent wrapper atop their normal stream

> CryptoInputStream can't handle concurrent access; inconsistent with HDFS
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-15557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15557
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 3.2.0
>            Reporter: Todd Lipcon
>            Priority: Major
>
> In general, the non-positional read APIs for streams in Hadoop Common are meant to be used by only a single thread at a time. It would not make much sense to have concurrent multi-threaded access to seek+read because they modify the stream's file position. Multi-threaded access on input streams can be done using positional read APIs. Multi-threaded access on output streams probably never makes sense.
> In the case of DFSInputStream, the positional read APIs are marked synchronized, so that even when misused, no strange exceptions are thrown. The results are just somewhat undefined in that it's hard for a thread to know which position was read from. However, when running on an encrypted file system, the results are much worse: since CryptoInputStream's read methods are not marked synchronized, the caller can get strange ByteBuffer exceptions or even a JVM crash due to concurrent use and free of underlying OpenSSL Cipher buffers.
> The crypto stream wrappers should be made more resilient to such misuse, for example by:
> (a) making the read methods safer by making them synchronized (so they have the same behavior as DFSInputStream)
> or
> (b) trying to detect concurrent access to these methods and throwing ConcurrentModificationException so that the user is alerted to their probable misuse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org