You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Xiao Li <xe...@aim.com> on 2014/02/13 02:26:07 UTC
OPENFORWRITE Files issue
Say I have a text file on hdfs in "OPENFORWRITE, HEALTHY" status. some process is appending to it.
It has 4 lines in it.
hadoop fs -cat /file | wc -l
4
However when I do a wordcount on this file, only first line is visible to the mapreduce. Similar in hive when i do "select count(*) from filetable" = 1
If I do "hadoop cp /file /file2", then it works as expected.(file2 is closed, file is still open)
wordcount would see 5 lines in the input directory(1 from opened file, 4 from copied file), hive will return 5.
I am wondering if there is anything related to TextInputFormat?
I am using CDH 4.4.0
Thanks.
Xiao Li