You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Vishal Santoshi <vi...@gmail.com> on 2019/02/14 18:42:11 UTC

StandAlone job on k8s fails with "Unknown method truncate" on restore

The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone
cluster on k8s,

* get the job running
* kill the pod.

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file:
hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on
org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate.
The previous version would see that "truncate" was not supported and drop a
length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Posted by Vishal Santoshi <vi...@gmail.com>.

We reverted back to BucketingSink and that works as expected. In conclusion
RollingFileSink needs hadoop 2.7 for for hadoop File System.

On Sat, Feb 23, 2019 at 2:30 PM Vishal Santoshi <vi...@gmail.com>
wrote:

> Any one ? I am sure there are hadoop 2.6 integrations with 1.7.1 OR I am
> overlooking something...
>
> On Fri, Feb 15, 2019 at 2:44 PM Vishal Santoshi <vi...@gmail.com>
> wrote:
>
>> Not sure,  but it seems this
>> https://issues.apache.org/jira/browse/FLINK-10203 may be a connected
>> issue.
>>
>> On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <
>> vishal.santoshi@gmail.com> wrote:
>>
>>> That log does not appear. It looks like we have egg and chicken issue.
>>>
>>> 2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient
>>>                     - Connecting to datanode 10.246.221.10:50010
>>>
>>> 2019-02-15 16:49:15,045 DEBUG
>>> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient
>>> - SASL client skipping handshake in unsecured configuration for
>>>
>>> addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[
>>> 10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]
>>>
>>> 2019-02-15 16:49:15,072 DEBUG
>>> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              -
>>> Instantiating for file system scheme hdfs Hadoop File System
>>> org.apache.hadoop.hdfs.DistributedFileSystem
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.use.legacy.blockreader.local = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.read.shortcircuit = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.domain.socket.data.traffic = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.domain.socket.path =
>>>
>>> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils
>>>                     - multipleLinearRandomRetry = null
>>>
>>> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client
>>>                     - getting client out of cache:
>>> org.apache.hadoop.ipc.Client@31920ade
>>>
>>> 2019-02-15 16:49:15,076 DEBUG
>>> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  -
>>> DataTransferProtocol not using SaslPropertiesResolver, no QOP found in
>>> configuration for dfs.data.transfer.protection
>>>
>>> 2019-02-15 16:49:15,080 INFO
>>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
>>> Subtask 3 initializing its state (max part counter=58).
>>>
>>> 2019-02-15 16:49:15,081 DEBUG
>>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
>>> Subtask 3 restoring: BucketState for
>>> bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and
>>> bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill,
>>> has open part file created @ 1550247946437
>>>
>>> 2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client
>>>                     - IPC Client (1270836494) connection to
>>> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56
>>>
>>> 2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client
>>>                     - IPC Client (1270836494) connection to
>>> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56
>>>
>>> 2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task
>>>                   - Source: Custom Source -> (Sink: Unnamed, Process ->
>>> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched
>>> from RUNNING to FAILED.
>>>
>>> java.io.IOException: Missing data in tmp file:
>>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/
>>> .part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86
>>>
>>>         at
>>> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)
>>>
>>>
>>>
>>>
>>>
>>>
>>> I do see
>>>
>>>
>>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Current Hadoop/Kerberos user: root
>>>
>>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation -
>>> 1.8/25.181-b13
>>>
>>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Maximum heap size: 1204 MiBytes
>>>
>>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  JAVA_HOME: /docker-java-home
>>>
>>> 2019-02-15 16:47:33,585 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Hadoop version: 2.7.5
>>>
>>>
>>>
>>> which has to be expected given that we are running the hadoop27flink
>>> 1.7.1 version.
>>>
>>>
>>>
>>> Does it make sense to go with a hadoop less version and inject the
>>> required jar files ?  Has that been done by anyone ?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <my...@live.com> wrote:
>>>
>>>> Hi
>>>>
>>>> When 'RollingSink' try to initialize state, it would first check
>>>> current file system supported truncate method. If file system not
>>>> supported, it would use another work-around solution, which means you
>>>> should not meet the problem. Otherwise 'RollingSink' thought and found the
>>>> reflection method of 'truncate' while the file system actually not support.
>>>> You could try to open DEBUG level to see whether log below could  be
>>>> printed:
>>>> Truncate not found. Will write a file with suffix '.valid-length' and
>>>> prefix '_' to specify how many bytes in a bucket are valid.
>>>>
>>>> However, from your second email, the more serious problem should be
>>>> using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter`
>>>> within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether
>>>> existed work around solution.
>>>>
>>>> Best
>>>> Yun Tang
>>>> ------------------------------
>>>> *From:* Vishal Santoshi <vi...@gmail.com>
>>>> *Sent:* Friday, February 15, 2019 3:43
>>>> *To:* user
>>>> *Subject:* Re: StandAlone job on k8s fails with "Unknown method
>>>> truncate" on restore
>>>>
>>>> And yes  cannot work with RollingFleSink for hadoop 2.6 release of
>>>> 1.7.1 b'coz of this.
>>>>
>>>> java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
>>>> 	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
>>>> 	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
>>>> 	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
>>>> 	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
>>>>
>>>>
>>>> Any work around ?
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <
>>>> vishal.santoshi@gmail.com> wrote:
>>>>
>>>> The job uses a RolllingFileSink to push data to hdfs. Run an HA
>>>> standalone cluster on k8s,
>>>>
>>>> * get the job running
>>>> * kill the pod.
>>>>
>>>> The k8s deployment relaunches the pod but fails with
>>>>
>>>> java.io.IOException: Missing data in tmp file:
>>>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987
>>>>
>>>>
>>>> Unknown method truncate called on
>>>> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>>>>
>>>>
>>>> The file does exist. We work with hadoop 2.6 , which does no have
>>>> truncate. The previous version would see that "truncate" was not supported
>>>> and drop a length file for the ,inprogress file and rename it to a valid
>>>> part file.
>>>>
>>>>
>>>>
>>>> Is this a known issue ?
>>>>
>>>>
>>>> Regards.
>>>>
>>>>
>>>>
>>>>
>>>>

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Posted by Vishal Santoshi <vi...@gmail.com>.

Any one ? I am sure there are hadoop 2.6 integrations with 1.7.1 OR I am
overlooking something...

On Fri, Feb 15, 2019 at 2:44 PM Vishal Santoshi <vi...@gmail.com>
wrote:

> Not sure,  but it seems this
> https://issues.apache.org/jira/browse/FLINK-10203 may be a connected
> issue.
>
> On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <
> vishal.santoshi@gmail.com> wrote:
>
>> That log does not appear. It looks like we have egg and chicken issue.
>>
>> 2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient
>>                     - Connecting to datanode 10.246.221.10:50010
>>
>> 2019-02-15 16:49:15,045 DEBUG
>> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient
>> - SASL client skipping handshake in unsecured configuration for
>>
>> addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[
>> 10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]
>>
>> 2019-02-15 16:49:15,072 DEBUG
>> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              -
>> Instantiating for file system scheme hdfs Hadoop File System
>> org.apache.hadoop.hdfs.DistributedFileSystem
>>
>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>                     - dfs.client.use.legacy.blockreader.local = false
>>
>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>                     - dfs.client.read.shortcircuit = false
>>
>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>                     - dfs.client.domain.socket.data.traffic = false
>>
>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>                     - dfs.domain.socket.path =
>>
>> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils
>>                     - multipleLinearRandomRetry = null
>>
>> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client
>>                     - getting client out of cache:
>> org.apache.hadoop.ipc.Client@31920ade
>>
>> 2019-02-15 16:49:15,076 DEBUG
>> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  -
>> DataTransferProtocol not using SaslPropertiesResolver, no QOP found in
>> configuration for dfs.data.transfer.protection
>>
>> 2019-02-15 16:49:15,080 INFO
>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
>> Subtask 3 initializing its state (max part counter=58).
>>
>> 2019-02-15 16:49:15,081 DEBUG
>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
>> Subtask 3 restoring: BucketState for
>> bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and
>> bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill,
>> has open part file created @ 1550247946437
>>
>> 2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client
>>                     - IPC Client (1270836494) connection to
>> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56
>>
>> 2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client
>>                     - IPC Client (1270836494) connection to
>> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56
>>
>> 2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task
>>                   - Source: Custom Source -> (Sink: Unnamed, Process ->
>> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched
>> from RUNNING to FAILED.
>>
>> java.io.IOException: Missing data in tmp file:
>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/
>> .part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86
>>
>>         at
>> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)
>>
>>
>>
>>
>>
>>
>> I do see
>>
>>
>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>       -  Current Hadoop/Kerberos user: root
>>
>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation -
>> 1.8/25.181-b13
>>
>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>       -  Maximum heap size: 1204 MiBytes
>>
>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>       -  JAVA_HOME: /docker-java-home
>>
>> 2019-02-15 16:47:33,585 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>       -  Hadoop version: 2.7.5
>>
>>
>>
>> which has to be expected given that we are running the hadoop27flink
>> 1.7.1 version.
>>
>>
>>
>> Does it make sense to go with a hadoop less version and inject the
>> required jar files ?  Has that been done by anyone ?
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <my...@live.com> wrote:
>>
>>> Hi
>>>
>>> When 'RollingSink' try to initialize state, it would first check current
>>> file system supported truncate method. If file system not supported, it
>>> would use another work-around solution, which means you should not meet the
>>> problem. Otherwise 'RollingSink' thought and found the reflection method of
>>> 'truncate' while the file system actually not support. You could try to
>>> open DEBUG level to see whether log below could  be printed:
>>> Truncate not found. Will write a file with suffix '.valid-length' and
>>> prefix '_' to specify how many bytes in a bucket are valid.
>>>
>>> However, from your second email, the more serious problem should be
>>> using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter`
>>> within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether
>>> existed work around solution.
>>>
>>> Best
>>> Yun Tang
>>> ------------------------------
>>> *From:* Vishal Santoshi <vi...@gmail.com>
>>> *Sent:* Friday, February 15, 2019 3:43
>>> *To:* user
>>> *Subject:* Re: StandAlone job on k8s fails with "Unknown method
>>> truncate" on restore
>>>
>>> And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1
>>> b'coz of this.
>>>
>>> java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
>>> 	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
>>> 	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
>>> 	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
>>> 	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
>>>
>>>
>>> Any work around ?
>>>
>>>
>>> On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <
>>> vishal.santoshi@gmail.com> wrote:
>>>
>>> The job uses a RolllingFileSink to push data to hdfs. Run an HA
>>> standalone cluster on k8s,
>>>
>>> * get the job running
>>> * kill the pod.
>>>
>>> The k8s deployment relaunches the pod but fails with
>>>
>>> java.io.IOException: Missing data in tmp file:
>>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987
>>>
>>>
>>> Unknown method truncate called on
>>> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>>>
>>>
>>> The file does exist. We work with hadoop 2.6 , which does no have
>>> truncate. The previous version would see that "truncate" was not supported
>>> and drop a length file for the ,inprogress file and rename it to a valid
>>> part file.
>>>
>>>
>>>
>>> Is this a known issue ?
>>>
>>>
>>> Regards.
>>>
>>>
>>>
>>>
>>>

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Posted by Vishal Santoshi <vi...@gmail.com>.

Not sure,  but it seems this
https://issues.apache.org/jira/browse/FLINK-10203 may be a connected issue.

On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <vi...@gmail.com>
wrote:

> That log does not appear. It looks like we have egg and chicken issue.
>
> 2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient
>                   - Connecting to datanode 10.246.221.10:50010
>
> 2019-02-15 16:49:15,045 DEBUG
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  -
> SASL client skipping handshake in unsecured configuration for
>
> addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[
> 10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]
>
> 2019-02-15 16:49:15,072 DEBUG
> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              -
> Instantiating for file system scheme hdfs Hadoop File System
> org.apache.hadoop.hdfs.DistributedFileSystem
>
> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>                   - dfs.client.use.legacy.blockreader.local = false
>
> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>                   - dfs.client.read.shortcircuit = false
>
> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>                   - dfs.client.domain.socket.data.traffic = false
>
> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>                   - dfs.domain.socket.path =
>
> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils
>                   - multipleLinearRandomRetry = null
>
> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client
>                   - getting client out of cache:
> org.apache.hadoop.ipc.Client@31920ade
>
> 2019-02-15 16:49:15,076 DEBUG
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  -
> DataTransferProtocol not using SaslPropertiesResolver, no QOP found in
> configuration for dfs.data.transfer.protection
>
> 2019-02-15 16:49:15,080 INFO
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
> Subtask 3 initializing its state (max part counter=58).
>
> 2019-02-15 16:49:15,081 DEBUG
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
> Subtask 3 restoring: BucketState for
> bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and
> bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill,
> has open part file created @ 1550247946437
>
> 2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client
>                   - IPC Client (1270836494) connection to
> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56
>
> 2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client
>                   - IPC Client (1270836494) connection to
> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56
>
> 2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task
>                   - Source: Custom Source -> (Sink: Unnamed, Process ->
> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched
> from RUNNING to FAILED.
>
> java.io.IOException: Missing data in tmp file:
> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/
> .part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86
>
>         at
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)
>
>
>
>
>
>
> I do see
>
>
> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>       -  Current Hadoop/Kerberos user: root
>
> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation -
> 1.8/25.181-b13
>
> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>       -  Maximum heap size: 1204 MiBytes
>
> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>       -  JAVA_HOME: /docker-java-home
>
> 2019-02-15 16:47:33,585 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>       -  Hadoop version: 2.7.5
>
>
>
> which has to be expected given that we are running the hadoop27flink 1.7.1
> version.
>
>
>
> Does it make sense to go with a hadoop less version and inject the
> required jar files ?  Has that been done by anyone ?
>
>
>
>
>
>
> On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <my...@live.com> wrote:
>
>> Hi
>>
>> When 'RollingSink' try to initialize state, it would first check current
>> file system supported truncate method. If file system not supported, it
>> would use another work-around solution, which means you should not meet the
>> problem. Otherwise 'RollingSink' thought and found the reflection method of
>> 'truncate' while the file system actually not support. You could try to
>> open DEBUG level to see whether log below could  be printed:
>> Truncate not found. Will write a file with suffix '.valid-length' and
>> prefix '_' to specify how many bytes in a bucket are valid.
>>
>> However, from your second email, the more serious problem should be using
>> 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within
>> 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work
>> around solution.
>>
>> Best
>> Yun Tang
>> ------------------------------
>> *From:* Vishal Santoshi <vi...@gmail.com>
>> *Sent:* Friday, February 15, 2019 3:43
>> *To:* user
>> *Subject:* Re: StandAlone job on k8s fails with "Unknown method
>> truncate" on restore
>>
>> And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1
>> b'coz of this.
>>
>> java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
>> 	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
>> 	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
>> 	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
>> 	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
>>
>>
>> Any work around ?
>>
>>
>> On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <
>> vishal.santoshi@gmail.com> wrote:
>>
>> The job uses a RolllingFileSink to push data to hdfs. Run an HA
>> standalone cluster on k8s,
>>
>> * get the job running
>> * kill the pod.
>>
>> The k8s deployment relaunches the pod but fails with
>>
>> java.io.IOException: Missing data in tmp file:
>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987
>>
>>
>> Unknown method truncate called on
>> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>>
>>
>> The file does exist. We work with hadoop 2.6 , which does no have
>> truncate. The previous version would see that "truncate" was not supported
>> and drop a length file for the ,inprogress file and rename it to a valid
>> part file.
>>
>>
>>
>> Is this a known issue ?
>>
>>
>> Regards.
>>
>>
>>
>>
>>

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Posted by Vishal Santoshi <vi...@gmail.com>.

That log does not appear. It looks like we have egg and chicken issue.

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient
                - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  -
SASL client skipping handshake in unsecured configuration for

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[
10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG
org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              -
Instantiating for file system scheme hdfs Hadoop File System
org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
                - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
                - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
                - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
                - dfs.domain.socket.path =

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils
                - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client
                - getting client out of cache:
org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  -
DataTransferProtocol not using SaslPropertiesResolver, no QOP found in
configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask
3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask
3 restoring: BucketState for
bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and
bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill,
has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client
                - IPC Client (1270836494) connection to
nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client
                - IPC Client (1270836494) connection to
nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task
                - Source: Custom Source -> (Sink: Unnamed, Process ->
Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched
from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file:
hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/
.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

        at
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)






I do see


2019-02-15 16:47:33,582 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner
      -  Current Hadoop/Kerberos user: root

2019-02-15 16:47:33,582 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner
      -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13

2019-02-15 16:47:33,582 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner
      -  Maximum heap size: 1204 MiBytes

2019-02-15 16:47:33,582 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner
      -  JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner
      -  Hadoop version: 2.7.5



which has to be expected given that we are running the hadoop27flink 1.7.1
version.



Does it make sense to go with a hadoop less version and inject the required
jar files ?  Has that been done by anyone ?






On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <my...@live.com> wrote:

> Hi
>
> When 'RollingSink' try to initialize state, it would first check current
> file system supported truncate method. If file system not supported, it
> would use another work-around solution, which means you should not meet the
> problem. Otherwise 'RollingSink' thought and found the reflection method of
> 'truncate' while the file system actually not support. You could try to
> open DEBUG level to see whether log below could  be printed:
> Truncate not found. Will write a file with suffix '.valid-length' and
> prefix '_' to specify how many bytes in a bucket are valid.
>
> However, from your second email, the more serious problem should be using
> 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within
> 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work
> around solution.
>
> Best
> Yun Tang
> ------------------------------
> *From:* Vishal Santoshi <vi...@gmail.com>
> *Sent:* Friday, February 15, 2019 3:43
> *To:* user
> *Subject:* Re: StandAlone job on k8s fails with "Unknown method truncate"
> on restore
>
> And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1
> b'coz of this.
>
> java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
> 	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
> 	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
> 	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
> 	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
>
>
> Any work around ?
>
>
> On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <vi...@gmail.com>
> wrote:
>
> The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone
> cluster on k8s,
>
> * get the job running
> * kill the pod.
>
> The k8s deployment relaunches the pod but fails with
>
> java.io.IOException: Missing data in tmp file:
> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987
>
>
> Unknown method truncate called on
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>
>
> The file does exist. We work with hadoop 2.6 , which does no have
> truncate. The previous version would see that "truncate" was not supported
> and drop a length file for the ,inprogress file and rename it to a valid
> part file.
>
>
>
> Is this a known issue ?
>
>
> Regards.
>
>
>
>
>

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Posted by Yun Tang <my...@live.com>.

Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could  be printed:
Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best
Yun Tang
________________________________
From: Vishal Santoshi <vi...@gmail.com>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this.


java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
        at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
        at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
        at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
        at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)


Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <vi...@gmail.com>> wrote:
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s,

* get the job running
* kill the pod.

The k8s deployment relaunches the pod but fails with


java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Posted by Vishal Santoshi <vi...@gmail.com>.

And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1
b'coz of this.

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop
are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)


Any work around ?


On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <vi...@gmail.com>
wrote:

> The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone
> cluster on k8s,
>
> * get the job running
> * kill the pod.
>
> The k8s deployment relaunches the pod but fails with
>
> java.io.IOException: Missing data in tmp file:
> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987
>
>
> Unknown method truncate called on
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>
>
> The file does exist. We work with hadoop 2.6 , which does no have
> truncate. The previous version would see that "truncate" was not supported
> and drop a length file for the ,inprogress file and rename it to a valid
> part file.
>
>
>
> Is this a known issue ?
>
>
> Regards.
>
>
>
>
>