You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Rinat <r....@cleverdata.ru> on 2019/06/17 09:29:42 UTC

StreamingFileSink with hdfs less than 2.7

Hi mates, I decided to enable persist the state of our flink jobs, that write data into hdfs, but got some troubles with that.

I’m trying to use StreamingFileSink with cloudera hadoop, which version is 2.6.5,  and it doesn’t contain truncate method.

So, job fails immediately when it’s trying to start, when trying to initialize HadoopRecoverableWriter. Because it only works with hadoop fs, greater or equals than 2.7

Do you have any plans to adopt recovery for hadoop file systems, that doesn’t contain truncate method, or how I can workaround such limitation ?

If workaround does not exist, than the following behaviour will be good enough:

get a path to the file, that should be restored
get a valid-length from the state
create a temporary directory and write stream from the restoring file into tmp until the valid-length is not reached
replace the restoring file with the file from tmp catalog
move file to the final state

what do you think about it ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

email: r.sharipov@cleverdata.ru <ma...@cleverdata.ru>
mobile: +7 (925) 416-37-26

CleverDATA
make your data clever