You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Thomas Marquardt (JIRA)" <ji...@apache.org> on 2017/08/25 19:31:00 UTC

[jira] [Comment Edited] (HADOOP-13327) Add OutputStream + Syncable to the Filesystem Specification

    [ https://issues.apache.org/jira/browse/HADOOP-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142064#comment-16142064 ] 

Thomas Marquardt edited comment on HADOOP-13327 at 8/25/17 7:30 PM:
--------------------------------------------------------------------

Thanks, cleaning up the specification and requirements for Filesystem is very helpful!

Some general feedback:

1) Is it too late to combine StreamCapabilities and Syncable?  StreamCapabilities and Syncable should be one interface.  It should have methods canHFlush, canHSync, hFlush, and hSync.  Users would then check if a stream implements the interface, call canHFlush/canHSync, and then hFlush/hSync if supported.  The fact that we're creating a helper for this (FSImplementationUtils.supportsSyncable) indicates an issue with the interface design, so I'm wondering if it is too late to update existing interfaces, maybe a new interface should be defined (Syncable2) and all the implementations gradually updated.

2) FSImplementationUtils.CloseChecker.  This class provides validation and synchronization for managing the stream state. The synchronization support is lacking though, for example, if one thread is in a flush or write call and another thread calls close(). I'm not sure how helpful it is to have a common class like this versus everyone
doing it their own way, but for a helper class, I'd prefer something along the lines of:


{code:java}
// Stream has three states: open, error, close
class StreamState {
  // Change state to close
  // @return - true iff state transitions from open or error to close
  boolean setStateClosed();

  // Change state to error and stores first error so it can be re-thrown.
  // @return - null if state transitions from open to error
  // @return - non-null if state is error or close.
  IOException setStateError(IOException);

  // Validate state is open.  Throws if not in open state.
  void checkState() throws IOException;

  // acquire exclusive lock
  void acquireLock() throws IOException;

  // release lock
  void releaseLock();
}
{code}


Specific feedback:

FSImplementationUtils.java:
 L26 - import specific classes instead of using '*'
 L45-46 - Preconditions.checkNotNull(capability) or reverse the comparisons
 L102 - I recommend removing isOpen since it is redundant next to IsClosed.

filesystem.md
 L612 - Object stores can create (overwrite=false) an empty file as a
 marker. However, object stores which have overwrite=true sematics
 cannot implement this easily.

outputstream.md
 L23 - "sp" should be "so"
 L53-142 - Generally speaking, I think the message is better delivered
 thru clear descriptions.  The notation is somewhat ambiguous to me.
 L149 - If close() fails, subsequent calls should re-throw the IOException.

AbstractContractCreateTest.java
 L350-352 This could be reduced to out.canFlush() with some
 refactoring of the interfaces.  It is too much code as-is.

S3ABlockOutputStream.java
 L320 - Since S3ABlockOutputStream.close is not synchronized, it is
 possible for a thread to be in a call to flush or write at the same
time the stream is being closed by another thread.


was (Author: tmarquardt):
Thanks, cleaning up the specification and requirements for Filesystem is very helpful!

Some general feedback:

1) Is it too late to combine StreamCapabilities and Syncable?  StreamCapabilities and Syncable should be one interface.  It should have methods canHFlush, canHSync, hFlush, and hSync.  Users would then check if a stream implements the interface, call canHFlush/canHSync, and then hFlush/hSync if supported.  The fact that we're creating a helper for this (FSImplementationUtils.supportsSyncable) indicates an issue with the interface design, so I'm wondering if it is too late to update existing interfaces, maybe a new interface should be defined (Syncable2) and all the implementations gradually updated.

2) FSImplementationUtils.CloseChecker.  This class provides validation and synchronization for managing the stream state. The synchronization support is lacking though, for example, if one thread is in a flush or write call and another thread calls close(). I'm not sure how helpful it is to have a common class like this versus everyone
doing it their own way, but for a helper class, I'd prefer something along the lines of:

// Stream has three states: open, error, close
class StreamState {
  // Change state to close
  // @return - true iff state transitions from open or error to close
  boolean setStateClosed();

  // Change state to error and stores first error so it can be re-thrown.
  // @return - null if state transitions from open to error
  // @return - non-null if state is error or close.
  IOException setStateError(IOException);

  // Validate state is open.  Throws if not in open state.
  void checkState() throws IOException;

  // acquire exclusive lock
  void acquireLock() throws IOException;

  // release lock
  void releaseLock();
}

Specific feedback:

FSImplementationUtils.java:
 L26 - import specific classes instead of using '*'
 L45-46 - Preconditions.checkNotNull(capability) or reverse the comparisons
 L102 - I recommend removing isOpen since it is redundant next to IsClosed.

filesystem.md
 L612 - Object stores can create (overwrite=false) an empty file as a
 marker. However, object stores which have overwrite=true sematics
 cannot implement this easily.

outputstream.md
 L23 - "sp" should be "so"
 L53-142 - Generally speaking, I think the message is better delivered
 thru clear descriptions.  The notation is somewhat ambiguous to me.
 L149 - If close() fails, subsequent calls should re-throw the IOException.

AbstractContractCreateTest.java
 L350-352 This could be reduced to out.canFlush() with some
 refactoring of the interfaces.  It is too much code as-is.

S3ABlockOutputStream.java
 L320 - Since S3ABlockOutputStream.close is not synchronized, it is
 possible for a thread to be in a call to flush or write at the same
time the stream is being closed by another thread.

> Add OutputStream + Syncable to the Filesystem Specification
> -----------------------------------------------------------
>
>                 Key: HADOOP-13327
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13327
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13327-002.patch, HADOOP-13327-branch-2-001.patch
>
>
> Write down what a Filesystem output stream should do. While core the API is defined in Java, that doesn't say what's expected about visibility, durability, etc —and Hadoop Syncable interface is entirely ours to define.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org