You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/04/15 07:32:01 UTC

[GitHub] [flink] JTaky opened a new pull request #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

JTaky opened a new pull request #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream
URL: https://github.com/apache/flink/pull/11747
 
 
   ##  What is the purpose of the change
   
   The new StreamingFileSink ( introduced in 1.6 Flink version ) use HadoopRecoverableFsDataOutputStream wrapper to write data in HDFS.
   
   HadoopRecoverableFsDataOutputStream is a wrapper for FSDataOutputStream to have an ability to restore from the certain point of file after failure and continue to write data. To achieve this recover functionality the HadoopRecoverableFsDataOutputStream use "truncate" method which was introduced only in Hadoop 2.7.
   
   Unfortunately, there are a few official Hadoop distributives which latest version still use Hadoop 2.6 (These distributives: Cloudera, Pivotal HD ). As the result, Flinks Hadoop connector can't work with this distributives.
   
   Flink declares that supported Hadoop from version 2.4.0 upwards (https://ci.apache.org/projects/flink/flink-docs-release-1.6/start/building.html#hadoop-versions)
   
   I guess we should emulate the functionality of "truncate" method for older Hadoop versions.
   The fix of this issue is vital for us as Hadoop 2.6 users.
   
   ## Brief change log
   
   1. Create a new file with '.truncated' extension in the same folder and write the content of the file with the required length.
   2. Remove original file.
   3. Rename the truncated file using the name of the original one. 
   
   **In case of failure:**
   On the first step of invocation of ‘truncate’ method it checks if the original file exists:
   
   If the original file exists - start the process from the beginning (point 1).
   
   if the original file not exists but exists the file with extension *.truncated .
   
   The absence of the original file tells us that truncated file was written fully and source crushed on the stage of renaming the truncated file. (I want to believe in the guarantee of atomicity of HDFS renaming operation) We can use it as a resultant file and finish the truncation process.
   
   ## Brief change log
   
   - Add new abstraction Truncater
   - Add Implementation for old Hadoop version ( LegacyTruncater)
   - Add Implementation for Hadoop 2.7 and upwards
   
   ## Verifying this change
   
   This change contains a test for LegacyTruncater.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: yes
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? JavaDocs
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream
URL: https://github.com/apache/flink/pull/11747#issuecomment-613877909
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=7496",
       "triggerID" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "status" : "SUCCESS",
       "url" : "https://travis-ci.com/github/flink-ci/flink/builds/160338415",
       "triggerID" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8e4cf4d69b57bd50e638dbec8c1596a63a837811 Travis: [SUCCESS](https://travis-ci.com/github/flink-ci/flink/builds/160338415) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=7496) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream
URL: https://github.com/apache/flink/pull/11747#issuecomment-613877909
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8e4cf4d69b57bd50e638dbec8c1596a63a837811 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream
URL: https://github.com/apache/flink/pull/11747#issuecomment-613869804
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 8e4cf4d69b57bd50e638dbec8c1596a63a837811 (Wed Apr 15 07:34:42 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream
URL: https://github.com/apache/flink/pull/11747#issuecomment-613877909
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=7496",
       "triggerID" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "status" : "PENDING",
       "url" : "https://travis-ci.com/github/flink-ci/flink/builds/160338415",
       "triggerID" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8e4cf4d69b57bd50e638dbec8c1596a63a837811 Travis: [PENDING](https://travis-ci.com/github/flink-ci/flink/builds/160338415) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=7496) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11747: [FLINK-10203][Connectors/FileSystem] Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream
URL: https://github.com/apache/flink/pull/11747#issuecomment-613877909
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=7496",
       "triggerID" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "status" : "SUCCESS",
       "url" : "https://travis-ci.com/github/flink-ci/flink/builds/160338415",
       "triggerID" : "8e4cf4d69b57bd50e638dbec8c1596a63a837811",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8e4cf4d69b57bd50e638dbec8c1596a63a837811 Travis: [SUCCESS](https://travis-ci.com/github/flink-ci/flink/builds/160338415) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=7496) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services