You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/15 08:59:45 UTC

[GitHub] [flink] kl0u opened a new pull request #10859: [FLINK-15278] Update the StreamingFileSink docs

kl0u opened a new pull request #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859
 
 
   ## What is the purpose of the change
   
   Updates the `StreamingFileSink` documentation to reflect the current state.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on a change in pull request #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
aljoscha commented on a change in pull request #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#discussion_r366760858
 
 

 ##########
 File path: docs/dev/connectors/streamfile_sink.md
 ##########
 @@ -361,7 +266,174 @@ input.addSink(sink)
 
 The SequenceFileWriterFactory supports additional constructor parameters to specify compression settings.
 
-### Important Considerations for S3
+## Bucket Assignment
+
+The bucketing logic defines how the data will be structured into subdirectories inside the base output directory.
+
+Both row and bulk formats (see [File Formats](#file-formats)) use the [DateTimeBucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html) as the default assigner.
+By default the `DateTimeBucketAssigner` creates hourly buckets based on the system default timezone
+with the following format: `yyyy-MM-dd--HH`. Both the date format (*i.e.* bucket size) and timezone can be
+configured manually.
+
+We can specify a custom [BucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html) by calling `.withBucketAssigner(assigner)` on the format builders.
+
+Flink comes with two built in BucketAssigners:
+
+ - [DateTimeBucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html) : Default time based assigner
+ - [BasePathBucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/BasePathBucketAssigner.html) : Assigner that stores all part files in the base path (single global bucket)
+
+## Rolling Policy
+
+The [RollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html) defines when a given in-progress part file will be closed and moved to the pending and later to finished state.
+Part files in the "finished" state are the ones that are ready for viewing and are guaranteed to contain valid data that will not be reverted in case of failure.
+The Rolling Policy in combination with the checkpointing interval (pending files become finished on the next checkpoint) control how quickly
+part files become available for downstream readers and also the size and number of these parts.
+
+Flink comes with two built-in RollingPolicies:
+
+ - [DefaultRollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html)
+ - [OnCheckpointRollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/OnCheckpointRollingPolicy.html)
+
+## Part file lifecycle
+
+In order to use the output of the `StreamingFileSink` in downstream systems, we need to understand the naming and lifecycle of the output files produced.
+
+Part files can be in one of three states:
+ 1. **In-progress** : The part file that is currently being written to is in-progress
+ 2. **Pending** : Closed (due to the specified rolling policy) in-progress files that are waiting to be committed
+ 3. **Finished** : On successful checkpoints pending files transition to "Finished"
+
+Only finished files are safe to read by downstream systems as those are guaranteed to not be modified later.
+
+<div class="alert alert-info">
+     <b>IMPORTANT:</b> Part file indexes are strictly increasing for any given subtask (in the order they were created). However these indexes are not always sequential. When the job restarts, the next part index for all subtask will be the `max part index + 1`
+where `max` is computed across all subtasks.
+</div>
+
+Each writer subtask will have a single in-progress part file at any given time for every active bucket, but there can be several pending and finished files.
+
+**Part file example**
+
+To better understand the lifecycle of these files let's look at a simple example with 2 sink subtasks:
+
+```
+└── 2019-08-25--12
+    ├── part-0-0.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    └── part-1-0.inprogress.ea65a428-a1d0-4a0b-bbc5-7a436a75e575
+```
+
+When the part file `part-1-0` is rolled (let's say it becomes too large), it becomes pending but it is not renamed. The sink then opens a new part file: `part-1-1`:
+
+```
+└── 2019-08-25--12
+    ├── part-0-0.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    ├── part-1-0.inprogress.ea65a428-a1d0-4a0b-bbc5-7a436a75e575
+    └── part-1-1.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11
+```
+
+As `part-1-0` is now pending completion, after the next successful checkpoint, it is finalized:
+
+```
+└── 2019-08-25--12
+    ├── part-0-0.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    ├── part-1-0
+    └── part-1-1.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11
+```
+
+New buckets are created as dictated by the bucketing policy, and this doesn't affect currently in-progress files:
+
+```
+└── 2019-08-25--12
+    ├── part-0-0.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    ├── part-1-0
+    └── part-1-1.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11
+└── 2019-08-25--13
+    └── part-0-2.inprogress.2b475fec-1482-4dea-9946-eb4353b475f1
+```
+
+Old buckets can still receive new records as the bucketing policy is evaluated on a per-record basis.
+
+### Part file configuration
+
+Finished files can be distinguished from the in-progress ones by their naming scheme only.
+
+By default, the file naming strategy is as follows:
+ - **In-progress / Pending**: `part-<subtaskIndex>-<partFileIndex>.inprogress.uid`
+ - **Finished:** `part-<subtaskIndex>-<partFileIndex>`
+
+Flink allows the user to specify a prefix and/or a suffix for his/her part files. 
+This can be done using an `OutputFileConfig`. 
+For example for a prefix "prefix" and a suffix ".ext" the sink will create the following files:
+
+```
+└── 2019-08-25--12
+    ├── prefix-0-0.ext
+    ├── prefix-0-1.ext.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    ├── prefix-1-0.ext
+    └── prefix-1-1.ext.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11
+```
+
+The user can specify an `OutputFileConfig` in the following way:
+
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
+{% highlight java %}
+
+OutputFileConfig config = OutputFileConfig
+ .builder()
+ .withPartPrefix("prefix")
+ .withPartSuffix(".ext")
+ .build();
+            
+StreamingFileSink<Tuple2<Integer, Integer>> sink = StreamingFileSink
+ .forRowFormat((new Path(outputPath), new SimpleStringEncoder<>("UTF-8"))
+ .withBucketAssigner(new KeyBucketAssigner())
+ .withRollingPolicy(OnCheckpointRollingPolicy.build())
+ .withOutputFileConfig(config)
+ .build();
+			
+{% endhighlight %}
+</div>
+<div data-lang="scala" markdown="1">
+{% highlight scala %}
+
+val config = OutputFileConfig
+ .builder()
+ .withPartPrefix("prefix")
+ .withPartSuffix(".ext")
+ .build()
+            
+val sink = StreamingFileSink
+ .forRowFormat(new Path(outputPath), new SimpleStringEncoder[String]("UTF-8"))
+ .withBucketAssigner(new KeyBucketAssigner())
+ .withRollingPolicy(OnCheckpointRollingPolicy.build())
+ .withOutputFileConfig(config)
+ .build()
+			
+{% endhighlight %}
+</div>
+</div>
+
+## Important Considerations
+
+### General
+
+<span class="label label-danger">Important Note 1</span>: When using Hadoop < 2.7, please use
+the `OnCheckpointRollingPolicy` which rolls part files on every checkpoint. The reason is that if part files "traverse"
+checkpoint interval, then, upon recovery from a failure the `StreamingFileSink` may use the `truncate()` method of the 
+filesystem to discard uncommitted data from the in-progress file. This method is not supported by pre-2.7 Hadoop versions 
+Flink will throw an exception.
 
 Review comment:
   Maybe "This method is not supported by pre-2.7 Hadoop versions and Flink will throw an exception."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#issuecomment-574562777
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 2deddd56f82d4fae825e3a5413779b3e4d71a151 (Wed Jan 15 09:03:51 UTC 2020)
   
   **Warnings:**
    * Documentation files were touched, but no `.zh.md` files: Update Chinese documentation or file Jira ticket.
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] kl0u commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
kl0u commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#issuecomment-574576922
 
 
   Merged.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#issuecomment-574575929
 
 
   <!--
   Meta data
   Hash:8da4414253fa8283acdd509d6be5c26903537ee4 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:8da4414253fa8283acdd509d6be5c26903537ee4
   -->
   ## CI report:
   
   * 8da4414253fa8283acdd509d6be5c26903537ee4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] kl0u commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
kl0u commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#issuecomment-574561566
 
 
   @gyfora could you have a look at this PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on a change in pull request #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
aljoscha commented on a change in pull request #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#discussion_r366760554
 
 

 ##########
 File path: docs/dev/connectors/streamfile_sink.md
 ##########
 @@ -361,7 +266,174 @@ input.addSink(sink)
 
 The SequenceFileWriterFactory supports additional constructor parameters to specify compression settings.
 
-### Important Considerations for S3
+## Bucket Assignment
+
+The bucketing logic defines how the data will be structured into subdirectories inside the base output directory.
+
+Both row and bulk formats (see [File Formats](#file-formats)) use the [DateTimeBucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html) as the default assigner.
+By default the `DateTimeBucketAssigner` creates hourly buckets based on the system default timezone
+with the following format: `yyyy-MM-dd--HH`. Both the date format (*i.e.* bucket size) and timezone can be
+configured manually.
+
+We can specify a custom [BucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html) by calling `.withBucketAssigner(assigner)` on the format builders.
+
+Flink comes with two built in BucketAssigners:
+
+ - [DateTimeBucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html) : Default time based assigner
+ - [BasePathBucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/BasePathBucketAssigner.html) : Assigner that stores all part files in the base path (single global bucket)
+
+## Rolling Policy
+
+The [RollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html) defines when a given in-progress part file will be closed and moved to the pending and later to finished state.
+Part files in the "finished" state are the ones that are ready for viewing and are guaranteed to contain valid data that will not be reverted in case of failure.
+The Rolling Policy in combination with the checkpointing interval (pending files become finished on the next checkpoint) control how quickly
+part files become available for downstream readers and also the size and number of these parts.
+
+Flink comes with two built-in RollingPolicies:
+
+ - [DefaultRollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html)
+ - [OnCheckpointRollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/OnCheckpointRollingPolicy.html)
+
+## Part file lifecycle
+
+In order to use the output of the `StreamingFileSink` in downstream systems, we need to understand the naming and lifecycle of the output files produced.
+
+Part files can be in one of three states:
+ 1. **In-progress** : The part file that is currently being written to is in-progress
+ 2. **Pending** : Closed (due to the specified rolling policy) in-progress files that are waiting to be committed
+ 3. **Finished** : On successful checkpoints pending files transition to "Finished"
+
+Only finished files are safe to read by downstream systems as those are guaranteed to not be modified later.
+
+<div class="alert alert-info">
+     <b>IMPORTANT:</b> Part file indexes are strictly increasing for any given subtask (in the order they were created). However these indexes are not always sequential. When the job restarts, the next part index for all subtask will be the `max part index + 1`
+where `max` is computed across all subtasks.
+</div>
+
+Each writer subtask will have a single in-progress part file at any given time for every active bucket, but there can be several pending and finished files.
+
+**Part file example**
+
+To better understand the lifecycle of these files let's look at a simple example with 2 sink subtasks:
+
+```
+└── 2019-08-25--12
+    ├── part-0-0.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    └── part-1-0.inprogress.ea65a428-a1d0-4a0b-bbc5-7a436a75e575
+```
+
+When the part file `part-1-0` is rolled (let's say it becomes too large), it becomes pending but it is not renamed. The sink then opens a new part file: `part-1-1`:
+
+```
+└── 2019-08-25--12
+    ├── part-0-0.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    ├── part-1-0.inprogress.ea65a428-a1d0-4a0b-bbc5-7a436a75e575
+    └── part-1-1.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11
+```
+
+As `part-1-0` is now pending completion, after the next successful checkpoint, it is finalized:
+
+```
+└── 2019-08-25--12
+    ├── part-0-0.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    ├── part-1-0
+    └── part-1-1.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11
+```
+
+New buckets are created as dictated by the bucketing policy, and this doesn't affect currently in-progress files:
+
+```
+└── 2019-08-25--12
+    ├── part-0-0.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    ├── part-1-0
+    └── part-1-1.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11
+└── 2019-08-25--13
+    └── part-0-2.inprogress.2b475fec-1482-4dea-9946-eb4353b475f1
+```
+
+Old buckets can still receive new records as the bucketing policy is evaluated on a per-record basis.
+
+### Part file configuration
+
+Finished files can be distinguished from the in-progress ones by their naming scheme only.
+
+By default, the file naming strategy is as follows:
+ - **In-progress / Pending**: `part-<subtaskIndex>-<partFileIndex>.inprogress.uid`
+ - **Finished:** `part-<subtaskIndex>-<partFileIndex>`
+
+Flink allows the user to specify a prefix and/or a suffix for his/her part files. 
+This can be done using an `OutputFileConfig`. 
+For example for a prefix "prefix" and a suffix ".ext" the sink will create the following files:
+
+```
+└── 2019-08-25--12
+    ├── prefix-0-0.ext
+    ├── prefix-0-1.ext.inprogress.bd053eb0-5ecf-4c85-8433-9eff486ac334
+    ├── prefix-1-0.ext
+    └── prefix-1-1.ext.inprogress.bc279efe-b16f-47d8-b828-00ef6e2fbd11
+```
+
+The user can specify an `OutputFileConfig` in the following way:
+
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
+{% highlight java %}
+
+OutputFileConfig config = OutputFileConfig
+ .builder()
+ .withPartPrefix("prefix")
+ .withPartSuffix(".ext")
+ .build();
+            
+StreamingFileSink<Tuple2<Integer, Integer>> sink = StreamingFileSink
+ .forRowFormat((new Path(outputPath), new SimpleStringEncoder<>("UTF-8"))
+ .withBucketAssigner(new KeyBucketAssigner())
+ .withRollingPolicy(OnCheckpointRollingPolicy.build())
+ .withOutputFileConfig(config)
+ .build();
+			
+{% endhighlight %}
+</div>
+<div data-lang="scala" markdown="1">
+{% highlight scala %}
+
+val config = OutputFileConfig
+ .builder()
+ .withPartPrefix("prefix")
+ .withPartSuffix(".ext")
+ .build()
+            
+val sink = StreamingFileSink
+ .forRowFormat(new Path(outputPath), new SimpleStringEncoder[String]("UTF-8"))
+ .withBucketAssigner(new KeyBucketAssigner())
+ .withRollingPolicy(OnCheckpointRollingPolicy.build())
+ .withOutputFileConfig(config)
+ .build()
+			
+{% endhighlight %}
+</div>
+</div>
+
+## Important Considerations
+
+### General
+
+<span class="label label-danger">Important Note 1</span>: When using Hadoop < 2.7, please use
+the `OnCheckpointRollingPolicy` which rolls part files on every checkpoint. The reason is that if part files "traverse"
+checkpoint interval, then, upon recovery from a failure the `StreamingFileSink` may use the `truncate()` method of the 
 
 Review comment:
   ```suggestion
   the checkpoint interval, then, upon recovery from a failure the `StreamingFileSink` may use the `truncate()` method of the 
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] kl0u commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
kl0u commented on issue #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#issuecomment-574568852
 
 
   I integrated your comments @aljoscha . Let me know what you think.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] kl0u closed pull request #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
kl0u closed pull request #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] kl0u edited a comment on issue #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
kl0u edited a comment on issue #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#issuecomment-574568852
 
 
   I integrated your comments @aljoscha . I will merge

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on a change in pull request #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
aljoscha commented on a change in pull request #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#discussion_r366761154
 
 

 ##########
 File path: docs/dev/connectors/streamfile_sink.md
 ##########
 @@ -381,4 +453,6 @@ aggressively and take a savepoint with some part-files being not fully uploaded,
 before the job is restarted. This will result in your job not being able to restore from that savepoint as the
 pending part-files are no longer there and Flink will fail with an exception as it tries to fetch them and fails.
 
+<!-- SAVEPOINTS -->
 
 Review comment:
   ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] kl0u commented on a change in pull request #10859: [FLINK-15278] Update the StreamingFileSink docs

Posted by GitBox <gi...@apache.org>.
kl0u commented on a change in pull request #10859: [FLINK-15278] Update the StreamingFileSink docs
URL: https://github.com/apache/flink/pull/10859#discussion_r366765926
 
 

 ##########
 File path: docs/dev/connectors/streamfile_sink.md
 ##########
 @@ -381,4 +453,6 @@ aggressively and take a savepoint with some part-files being not fully uploaded,
 before the job is restarted. This will result in your job not being able to restore from that savepoint as the
 pending part-files are no longer there and Flink will fail with an exception as it tries to fetch them and fails.
 
+<!-- SAVEPOINTS -->
 
 Review comment:
   A forgotten "note to self" about what to include. Removed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services