You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Andrew Wang (JIRA)" <ji...@apache.org> on 2014/06/05 00:47:02 UTC

[jira] [Updated] (HADOOP-9361) Strictly define the expected behavior of filesystem APIs and write tests to verify compliance

     [ https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Wang updated HADOOP-9361:
--------------------------------

    Attachment: HADOOP-9361.awang-addendum.patch

Cool, thanks Steve. Here's a patch with a bunch of docs changes. I applied your base patch to git hash b7da01dea546746e34195882df0dee789bc2e3c5 (the trunk HDFS-6268 commit), then made this one on top of that. There's a lot of content here, so I won't claim to have caught everything, but hopefully it'll help you refine the text.

I also had some other higher level comments about the text (haven't looked at the code yet):

High-level/misc:
- [Wikipedia](https://en.wikipedia.org/wiki/Dash#En_dash_versus_em_dash) is enlightening on the subject of em and en dashes, it seems like an unspaced em-dash ({{&mdash;}}) may be slightly more canonical than a spaced en-dash ({{&mdash;}}). I used the unspaced emdash.
- I assume we wanted to say "FileSystem" with camel casing in a lot of places. Please check the rest.
- Also tried to use the Oxford comma everywhere, it varied in your writing.
- A table of contents would be useful for the bigger pages.

Introduction
- Maybe should define what a "blobstore" is? Opinions vary, I've heard people call HDFS a "blobstore" before because it's append-only and not POSIX.
- What is "immediate consistency"?
- In Atomicity, note that we don't actually implement mkdir in FileSystem, but mkdirs.
- Link for "one-copy-update-semantics"? Also, while HDFS does support this, it requires some hoops with reopening the DFSInputStream to get the new file length of files being written. Bit of a caveat.
- Concurrency and consistency are separate sections? I'm not sure what the difference is.
- "HDFS: 8000", is this bytes or characters?
- The "undefined limits" and "undefined timeouts" come off as commentary, should we be sprinkling SHOULD around, or giving more advice about "typical" expectations?

Notation
- In Exceptions, it says that you can provide a set of exceptions, but you used list syntax. Wasn't sure here, so I switched it to set syntax (curly braces).

Model:
- I think "path component" is a more standard term than "path element"?
- Paths are URIs in Hadoop, is that worth mentioning here? Path URI normalization is also complicated, things like extra slashes and ".." are (I believe) sometimes normalized out.
- Where is it specified how to turn a path into path components? There's also a need to strip out parts of the URI like the scheme and authority.
- ancestors doesn't have preconditions
- I'm not sure we'll ever get symlinks in FileSystem to be honest, I'd consider just removing these references.
- isDescendant definition refers to itself, is this right?
- Dunno how to parse the "File references statement" about data dictionary
- dangling sentence in "User home"

Filesystem:
- Looking at the code, getBlockLocations takes a Path, and throws a FileNotFoundException if it doesn't exist.
- The cluster topology stuff seems like a non-sequiter, maybe more explanation?
- append Postconditions, just "FS"?
- "the file and its data is still", dunno how to parse this
- Worth mentioning permissions as related to recursive delete? I know permissions are assumed, but IIRC the atomicity still holds. 
- I'm wondering about the use of MUST with regard to atomicity of recursive delete. In other places, you mention that behavior is undefined or implementation-dependent, but here you say it's a MUST but these other FileSystems don't support it.

FSDIS:
- Close in a distributed filesystem is a thorny problem. Have you seen HDFS-4504? I've also heard of Flume and HBase errors related to close continually throwing IOException, not sure of the current status.
- Mention and formatting of exceptions is not uniform. InputStream.read is one example, it doesn't use the list syntax, and NullPointerException is mentioned in the preconditions box but not the exceptions box.
- Invariants are not uniformly formatted
- Formatting of true and false is not uniform (e.g. {{True}} and true in text).
- seekToNewSource, irregular use of terms "files" and "blocks", not sure if you wanted to avoid talking about blocks entirely, or instead wanted to define both terms
- I think we could use some more rigor in the Consistency section, some of it seems underspecified

Testing:
- Note that "LocalFS" is a FileContext class, LocalFileSystem is probably what you want. Haven't looked at the code yet, but might need to rename some things.
- Rather than saying "Windows and OS/X filesystem", can we go further and say, e.g. HPFS, NTFS, FAT, etc?
- I can't parse the paragraph beginning with "A recommended strategy"
- Definition of concurrency vs consistency again?

> Strictly define the expected behavior of filesystem APIs and write tests to verify compliance
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9361
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9361
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, test
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch, HADOOP-9361-003.patch, HADOOP-9361-004.patch, HADOOP-9361-005.patch, HADOOP-9361-006.patch, HADOOP-9361-007.patch, HADOOP-9361-008.patch, HADOOP-9361-009.patch, HADOOP-9361-011.patch, HADOOP-9361-012.patch, HADOOP-9361-013.patch, HADOOP-9361-014.patch, HADOOP-9361.awang-addendum.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while HDFS gets tested downstream, other filesystems, such as blobstore bindings, don't.
> The only tests that are common are those of {{FileSystemContractTestBase}}, which HADOOP-9258 shows is incomplete.
> I propose 
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes, instead of one big test suite. 
> # Having each FS declare via a properties file what behaviors they offer, such as atomic-rename, atomic-delete, umask, immediate-consistency -test methods can downgrade to skipped test cases if a feature is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)