You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bahubali Jain <ba...@gmail.com> on 2014/12/03 09:31:31 UTC
textFileStream() issue?
Hi,
I am trying to use textFileStream("some_hdfs_location") to pick new files
from a HDFS location.I am seeing a pretty strange behavior though.
textFileStream() is not detecting new files when I "move" them from a
location with in hdfs to location at which textFileStream() is checking for
new files.
But when I copy files from a location in linux filesystem to hdfs then the
textFileStream is detecting the new files.
Is this a know issue?
Thanks,
Baahu
Re: textFileStream() issue?
Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,
On Wed, Dec 3, 2014 at 5:31 PM, Bahubali Jain <ba...@gmail.com> wrote:
>
> I am trying to use textFileStream("some_hdfs_location") to pick new files
> from a HDFS location.I am seeing a pretty strange behavior though.
> textFileStream() is not detecting new files when I "move" them from a
> location with in hdfs to location at which textFileStream() is checking for
> new files.
> But when I copy files from a location in linux filesystem to hdfs then the
> textFileStream is detecting the new files.
>
Is it possible that the timestamp of the moved files is actually older than
the ones of previously processed files? I think only "new" files are picked
up. Try moving the file and set the timestamp to now() to see if it makes a
difference.
Tobias