You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sameer Choudhary (JIRA)" <ji...@apache.org> on 2019/01/04 06:00:00 UTC

[jira] [Comment Edited] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

    [ https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733840#comment-16733840 ] 

Sameer Choudhary edited comment on HADOOP-15229 at 1/4/19 5:59 AM:
-------------------------------------------------------------------

[~stevel@apache.org]
{quote}you can't do a seek to an offset in the file, because the results are coming in dynamically from a POST; there's no GET for a content length. Which brings it down to: do you skip() or read-and-discard. The trouble with skip is I'm not sure about all its failure modes here. {{skip(count)}} can return a value < {{count}} and you are left wondering what to do? Keep retrying until total == count? Now I know of a way to check for end of stream/errors in the select, that may be possible.
{quote}
What is the difference here between skip and read-and-discard? I believe by skip you meant not even to deserialize the payload in the Record Message. S3 Select's streaming protocol encodes results of the query in Record Messages. The message contains the number of bytes in the payload of the message ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html).] 

Maybe for skip we can just take a look at Record Message header section and skip the message entirely, if we have not reached our desired length yet. There are following termination conditions:
 * If we get a RecordLevelError then we can throw.
 * If we don't receive End Message for a defined time then we can timeout.
 * If we get End Message before desired bytes were skipped then we can return response that the result is smaller than desired.
 * Last is the happy case where we have more bytes than desired for skipping. We may need to look at each header and keep a counter of the bytes range the current Record Message contains. We could start deserializing the payload when byte range consists of non skipped bytes.

I believe the above sum up all of the edge cases that could occur with the response.

AWS SDK abstracts the handling of the messages thus it might get tricky to implement this in a simple manner.
{quote}I worry that some test may only show up if the file is above a certain size, as it will come in on a later page of responses, won't it?
{quote}
 

That is true. S3 Select sends data in Record Messages where each message can contain more than one logical CSV output rows. There are two cases here:
 # If we get an error before the first message is sent then we will receive a HTTP status Code <> 200. Along with the error code and error message.
 # Else, if the error occurs after first message is sent then first message will contain certain number of output rows with HTTP Status Code == 200, followed by EventStreamException later.

For tests, how about:
 * Test 1: A CSV file with just a couple of rows where second row contents will result in a type cast error. This should return an HTTP status code 400 with appropriate error message and error code.
 * Test 2: A CSV file just large enough so that we get HTTP status code 200 followed by an EventStremException with appropriate error message and error code.

 


was (Author: sameer.chouhdary):
[~stevel@apache.org]

 > you can't do a seek to an offset in the file, because the results are coming in dynamically from a POST; there's no GET for a content length. Which brings it down to: do you skip() or read-and-discard. The trouble with skip is I'm not sure about all its failure modes here. {{skip(count)}} can return a value < {{count}} and you are left wondering what to do? Keep retrying until total == count? Now I know of a way to check for end of stream/errors in the select, that may be possible.

What is the difference here between skip and read-and-discard? I believe by skip you meant not even to deserialize the payload in the Record Message. S3 Select's streaming protocol encodes results of the query in Record Messages. The message contains the number of bytes in the payload of the message ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html).] 

Maybe for skip we can just take a look at Record Message header section and skip the message entirely, if we have not reached our desired length yet. There are following termination conditions:
 * If we get a RecordLevelError then we can throw.
 * If we don't receive End Message for a defined time then we can timeout.
 * If we get End Message before desired bytes were skipped then we can return response that the result is smaller than desired.
 * Last is the happy case where we have more bytes than desired for skipping. We may need to look at each header and keep a counter of the bytes range the current Record Message contains. We could start deserializing the payload when byte range consists of non skipped bytes.

I believe the above sum up all of the edge cases that could occur with the response.

AWS SDK abstracts the handling of the messages thus it might get tricky to implement this in a simple manner.

> I worry that some test may only show up if the file is above a certain size, as it will come in on a later page of responses, won't it?

 

That is true. S3 Select sends data in Record Messages where each message can contain more than one logical CSV output rows. There are two cases here:
 # If we get an error before the first message is sent then we will receive a HTTP status Code <> 200. Along with the error code and error message.
 # Else, if the error occurs after first message is sent then first message will contain certain number of output rows with HTTP Status Code == 200, followed by EventStreamException later.

For tests, how about:
 * Test 1: A CSV file with just a couple of rows where second row contents will result in a type cast error. This should return an HTTP status code 400 with appropriate error message and error code.
 * Test 2: A CSV file just large enough so that we get HTTP status code 200 followed by an EventStremException with appropriate error message and error code.

 

> Add FileSystem builder-based openFile() API to match createFile()
> -----------------------------------------------------------------
>
>                 Key: HADOOP-15229
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15229
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, HADOOP-15229-015.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for working with object stores, where getting the decision to do a full GET and TCP abort on seek vs smaller GETs is fundamentally different: the wrong option can cost you minutes. S3A and Azure both have adaptive policies now (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org