You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/03/12 18:37:00 UTC
[jira] [Commented] (HADOOP-13007) cherry pick s3 ehancements from
PrestoS3FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-13007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058166#comment-17058166 ]
Steve Loughran commented on HADOOP-13007:
-----------------------------------------
h2. Presto features
I just had a big look through the code to see what is directory logic was.
* docs: https://prestosql.io/docs/current/connector/hive.html#amazon-s3-configuration
* source https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java
h3. Interesting points
* glacier objects can be skipped
* list always adds the trailing / for non-empty and then does the scan
* it doesn't create directories, so no need to recreate them
(I was about to say "demand create AWS client", but I misread it. Interesting thought though
h2. listLocatedStatus
does a list under path p + /
{code}
String key = keyFromPath(path);
if (!key.isEmpty()) {
key += PATH_SEPARATOR;
}
ListObjectsRequest request = new ListObjectsRequest()
.withBucketName(getBucketName(uri))
.withPrefix(key)
.withDelimiter(PATH_SEPARATOR);
{code}
mapping to stats can skip "GLACIER" == getStorageClass; they aren't visible at all.
{code}
private Iterator<LocatedFileStatus> statusFromObjects(List<S3ObjectSummary> objects)
{
// NOTE: for encrypted objects, S3ObjectSummary.size() used below is NOT correct,
// however, to get the correct size we'd need to make an additional request to get
// user metadata, and in this case it doesn't matter.
return objects.stream()
.filter(object -> !object.getKey().endsWith(PATH_SEPARATOR))
.filter(object -> !skipGlacierObjects || !isGlacierObject(object))
.map(object -> new FileStatus(
object.getSize(),
false,
1,
BLOCK_SIZE.toBytes(),
object.getLastModified().getTime(),
qualifiedPath(new Path(PATH_SEPARATOR + object.getKey()))))
.map(this::createLocatedFileStatus)
.iterator();
}
{code}
Note: this feeds into delete(); intentional or not
{code}
public boolean mkdirs(Path f, FsPermission permission)
{
// no need to do anything for S3
return true;
}
{code}
* but getFileStatus does do: HEAD, HEAD + /, LIST + /
{code}
return new FileStatus(
getObjectSize(path, metadata),
S3_DIRECTORY_OBJECT_CONTENT_TYPE.equals(metadata.getContentType()),
1,
BLOCK_SIZE.toBytes(),
lastModifiedTime(metadata),
qualifiedPath(path));
{code}
* file length inferred from metadata -"x-amz-unencrypted-content-length" parsed. docs acknowlege that the results of a list are not consistent.
* dir marker content type used for "is this a dir"
h3. input stream
optimised for forward reads;
* doesn't do a HEAD at open
* lazy seek
* skips range (default 1MB) before (non-draining) abort() and reopen
* simply opens with initial range in GET, but no limit
* maps 416 -> EOFException
h3. delete()
- doesn't do bulk deletes, just lists children and recursively calls delete on them (very, very inefficient)
-for recursive delete, if glacier files are skipped, they don't get deleted. Interesting idea
h3. Metrics
implements AWS SDK stats collection in com.facebook.presto.hive.s3.PrestoS3FileSystemMetricCollector; these feed back to the Presto metrics
h3. create()
if overwrite=true, skips all checks (even for dest being a dir).
{code}
if ((!overwrite) && exists(path)) {
throw new IOException("File already exists:" + path);
}
{code}
Should we do that? it's, well, aggressive, but for apps which know what they are doing...
1. A new transfer manager is created for each output stream
1. The entire file is written to the staging dir
1. the transfer manager is given the file to upload; it will partition and upload how it chooses
so: no incremental writes, disk always used. But good simplicitly and no risk of partial uploads remaining around.
omitting all existence checks does make for a faster write and avoids all 404s sneaking in.
h2. Overall analysis
It's nicely minimal; optimised for directory trees with no markers.
* input stream seems a lot less optimised than our code, which works better for backwards seeking clients.
* output stream avoids 404s by omitting all probes. Something to consider
* Metric wireup from AWS SDK looks simple enough for us to copy
* it does handle CSE where actual length < list length, but caller has to do a HEAD to see this; if it runs off the result of listLocatedStatus and then did a seek off EOF-16, things would fail. Similarly: splitting.
What can we adopt?
# metrics
# maybe: dest is dir for overwrite=true
# dir content type get/check (ie
{code}
isDir == (path.endswith"/ && len==0) || content_type==_"application/x-directory")
{code}
> cherry pick s3 ehancements from PrestoS3FileSystem
> --------------------------------------------------
>
> Key: HADOOP-13007
> URL: https://issues.apache.org/jira/browse/HADOOP-13007
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Priority: Minor
>
> Looking at [https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java], they've done some interesting things: configurable connection timeouts and, retry options, statistics to count exceptions caught/re-opened, and more
> review them, if there is good stuff there, add it to S3a
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org