You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vishal Santoshi <vi...@gmail.com> on 2019/03/09 12:31:10 UTC

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

We reverted back to BucketingSink and that works as expected. In conclusion
RollingFileSink needs hadoop 2.7 for for hadoop File System.

On Sat, Feb 23, 2019 at 2:30 PM Vishal Santoshi <vi...@gmail.com>
wrote:

> Any one ? I am sure there are hadoop 2.6 integrations with 1.7.1 OR I am
> overlooking something...
>
> On Fri, Feb 15, 2019 at 2:44 PM Vishal Santoshi <vi...@gmail.com>
> wrote:
>
>> Not sure,  but it seems this
>> https://issues.apache.org/jira/browse/FLINK-10203 may be a connected
>> issue.
>>
>> On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <
>> vishal.santoshi@gmail.com> wrote:
>>
>>> That log does not appear. It looks like we have egg and chicken issue.
>>>
>>> 2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient
>>>                     - Connecting to datanode 10.246.221.10:50010
>>>
>>> 2019-02-15 16:49:15,045 DEBUG
>>> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient
>>> - SASL client skipping handshake in unsecured configuration for
>>>
>>> addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[
>>> 10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]
>>>
>>> 2019-02-15 16:49:15,072 DEBUG
>>> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              -
>>> Instantiating for file system scheme hdfs Hadoop File System
>>> org.apache.hadoop.hdfs.DistributedFileSystem
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.use.legacy.blockreader.local = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.read.shortcircuit = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.domain.socket.data.traffic = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.domain.socket.path =
>>>
>>> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils
>>>                     - multipleLinearRandomRetry = null
>>>
>>> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client
>>>                     - getting client out of cache:
>>> org.apache.hadoop.ipc.Client@31920ade
>>>
>>> 2019-02-15 16:49:15,076 DEBUG
>>> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  -
>>> DataTransferProtocol not using SaslPropertiesResolver, no QOP found in
>>> configuration for dfs.data.transfer.protection
>>>
>>> 2019-02-15 16:49:15,080 INFO
>>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
>>> Subtask 3 initializing its state (max part counter=58).
>>>
>>> 2019-02-15 16:49:15,081 DEBUG
>>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
>>> Subtask 3 restoring: BucketState for
>>> bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and
>>> bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill,
>>> has open part file created @ 1550247946437
>>>
>>> 2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client
>>>                     - IPC Client (1270836494) connection to
>>> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56
>>>
>>> 2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client
>>>                     - IPC Client (1270836494) connection to
>>> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56
>>>
>>> 2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task
>>>                   - Source: Custom Source -> (Sink: Unnamed, Process ->
>>> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched
>>> from RUNNING to FAILED.
>>>
>>> java.io.IOException: Missing data in tmp file:
>>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/
>>> .part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86
>>>
>>>         at
>>> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)
>>>
>>>
>>>
>>>
>>>
>>>
>>> I do see
>>>
>>>
>>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Current Hadoop/Kerberos user: root
>>>
>>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation -
>>> 1.8/25.181-b13
>>>
>>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Maximum heap size: 1204 MiBytes
>>>
>>> 2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  JAVA_HOME: /docker-java-home
>>>
>>> 2019-02-15 16:47:33,585 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Hadoop version: 2.7.5
>>>
>>>
>>>
>>> which has to be expected given that we are running the hadoop27flink
>>> 1.7.1 version.
>>>
>>>
>>>
>>> Does it make sense to go with a hadoop less version and inject the
>>> required jar files ?  Has that been done by anyone ?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <my...@live.com> wrote:
>>>
>>>> Hi
>>>>
>>>> When 'RollingSink' try to initialize state, it would first check
>>>> current file system supported truncate method. If file system not
>>>> supported, it would use another work-around solution, which means you
>>>> should not meet the problem. Otherwise 'RollingSink' thought and found the
>>>> reflection method of 'truncate' while the file system actually not support.
>>>> You could try to open DEBUG level to see whether log below could  be
>>>> printed:
>>>> Truncate not found. Will write a file with suffix '.valid-length' and
>>>> prefix '_' to specify how many bytes in a bucket are valid.
>>>>
>>>> However, from your second email, the more serious problem should be
>>>> using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter`
>>>> within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether
>>>> existed work around solution.
>>>>
>>>> Best
>>>> Yun Tang
>>>> ------------------------------
>>>> *From:* Vishal Santoshi <vi...@gmail.com>
>>>> *Sent:* Friday, February 15, 2019 3:43
>>>> *To:* user
>>>> *Subject:* Re: StandAlone job on k8s fails with "Unknown method
>>>> truncate" on restore
>>>>
>>>> And yes  cannot work with RollingFleSink for hadoop 2.6 release of
>>>> 1.7.1 b'coz of this.
>>>>
>>>> java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
>>>> 	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
>>>> 	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
>>>> 	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
>>>> 	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
>>>>
>>>>
>>>> Any work around ?
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <
>>>> vishal.santoshi@gmail.com> wrote:
>>>>
>>>> The job uses a RolllingFileSink to push data to hdfs. Run an HA
>>>> standalone cluster on k8s,
>>>>
>>>> * get the job running
>>>> * kill the pod.
>>>>
>>>> The k8s deployment relaunches the pod but fails with
>>>>
>>>> java.io.IOException: Missing data in tmp file:
>>>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987
>>>>
>>>>
>>>> Unknown method truncate called on
>>>> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>>>>
>>>>
>>>> The file does exist. We work with hadoop 2.6 , which does no have
>>>> truncate. The previous version would see that "truncate" was not supported
>>>> and drop a length file for the ,inprogress file and rename it to a valid
>>>> part file.
>>>>
>>>>
>>>>
>>>> Is this a known issue ?
>>>>
>>>>
>>>> Regards.
>>>>
>>>>
>>>>
>>>>
>>>>