You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by sjwiesman <gi...@git.apache.org> on 2017/08/27 22:55:14 UTC

[GitHub] flink pull request #4607: [FLINK-6306][connectors] Sink for eventually consi...

GitHub user sjwiesman opened a pull request:

    https://github.com/apache/flink/pull/4607

    [FLINK-6306][connectors] Sink for eventually consistent file systems

    ## What is the purpose of the change
    
    This pull request implements a sink for writing out to an eventually consistent filesystem, such as Amazon S3, with exactly once semantics. 
    
    
    ## Brief change log
      - The sink stages files on a consistent filesystem (local, hdfs, etc) .
      - Once per checkpoint, files are copied to the eventually consistent filesystem. 
      - When a checkpoint completion notification is sent, the files are marked consistent. Otherwise, they are left because delete is not a consistent operation.
      - It is up to consumers to choose their semantics; at least once by reading all files, or exactly once by only reading files marked consistent. 
    
    
    ## Verifying this change
    This change added tests and can be verified as follows:
    
      - Added tests based on the existing BucketingSink test suite. 
      - Added tests that verify semantics based on different checkpointing combinations (successful, concurrent, timed out, and failed). 
      - Added integration test that verifies exactly once holds during failure. 
      - Manually verified by having run in production for several months. 
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper:no 
    
    ## Documentation
    
      - Does this pull request introduce a new feature? yes
      - If yes, how is the feature documented? JavaDocs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sjwiesman/flink FLINK-6306

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4607.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4607
    
----
commit 347ea767195d74efc39964c02ace1bbe10d8aa0a
Author: Seth Wiesman <sw...@mediamath.com>
Date:   2017-08-27T21:36:04Z

    [FLINK-6306][connectors] Sink for eventually consistent file systems

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    We cannot restart Travis ourselves. Only the contributor can schedule another run by adding another commit (even an empty one). However, please don't do that for the sake of getting a picture-perfect build; we are aware of some unstable tests and account for that in the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by sjwiesman <gi...@git.apache.org>.
Github user sjwiesman commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    I'm going to go ahead and close this pr and issue to avoid confusion. 


---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    @stevenzwu I'm afraid the currently isn't. But one should be started once work starts on the new sink for Flink 1.6


---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    I think we won't merge this as is. There are some thoughts flying around for reworking the BucketingSink to use our own `FileSystem` implementation and also making it work well with S3. I'm quite sure that this work will land in 1.6 but we have to release 1.5 now because it's already late.


---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by sjwiesman <gi...@git.apache.org>.
Github user sjwiesman commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    Would you be able to rerun travis, the test failed on a single configuration during the Kafka09ITTest due to a task manager failure. I do not believe any of my code changes touched any of the code paths in that test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by stevenzwu <gi...@git.apache.org>.
Github user stevenzwu commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    @aljoscha is there any doc/write-up about the reworking of BucketingSink?


---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by sjwiesman <gi...@git.apache.org>.
Github user sjwiesman commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    * clicked the wrong button 


---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by sjwiesman <gi...@git.apache.org>.
Github user sjwiesman commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    CC: @aljoscha 
    
    I screwed up the rebase so I cherry picked the updates into a new branch and re-opened the pr. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4607: [FLINK-6306][connectors] Sink for eventually consistent f...

Posted by narayaruna <gi...@git.apache.org>.
Github user narayaruna commented on the issue:

    https://github.com/apache/flink/pull/4607
  
    @aljoscha can we merge this PR. This is very helpful to ingest data to S3 in a more reliable way. BucketingSink writing to S3 errors out making the data ingestion less reliable. 


---

[GitHub] flink pull request #4607: [FLINK-6306][connectors] Sink for eventually consi...

Posted by sjwiesman <gi...@git.apache.org>.
Github user sjwiesman closed the pull request at:

    https://github.com/apache/flink/pull/4607


---