You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vi...@ubikod.com> on 2009/10/19 17:03:21 UTC
Synchronization issue while storing a file and accessing it using
hadoop API
Hello to all of you,
I have some PIG code I run from Java that store a file on Hadoop:
Analytics.pigServer.store("session_count_and_length",
"session_count_and_length");
An then just after I try to read from this file using the Hadoop API:
FSDataInputStream is;
Path filePath = new Path("session_count_and_length");
Path partPath = new Path(path + "/part-00000");
is = Analytics.hadoopFs.open(partPath);
I RANDOMLY got the following exception:
java.io.FileNotFoundException: File
app1_stats/session_count_and_length/part-00000 does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
And when I check my Haddop FS, the file actually exist.
It seems that there is a race condition there between PIG creating
this file, returning and Hadoop considering this file as existing.
Any suggestion ?
Thanks a lot.
Re: Synchronization issue while storing a file and accessing it
using hadoop API
Posted by Vincent Barat <vi...@ubikod.com>.
Forget about this... I'm ashamed to say that it was an Hadoop
configuration issue :-)
Vincent Barat a écrit :
> Hello to all of you,
>
> I have some PIG code I run from Java that store a file on Hadoop:
>
> Analytics.pigServer.store("session_count_and_length",
> "session_count_and_length");
>
> An then just after I try to read from this file using the Hadoop API:
>
> FSDataInputStream is;
> Path filePath = new Path("session_count_and_length");
> Path partPath = new Path(path + "/part-00000");
> is = Analytics.hadoopFs.open(partPath);
>
> I RANDOMLY got the following exception:
>
> java.io.FileNotFoundException: File
> app1_stats/session_count_and_length/part-00000 does not exist.
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
>
> at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
>
> And when I check my Haddop FS, the file actually exist.
>
> It seems that there is a race condition there between PIG creating this
> file, returning and Hadoop considering this file as existing.
>
> Any suggestion ?
>
> Thanks a lot.