You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vi...@ubikod.com> on 2009/10/19 17:03:21 UTC

Synchronization issue while storing a file and accessing it using hadoop API

Hello to all of you,

I have some PIG code I run from Java that store a file on Hadoop:

       Analytics.pigServer.store("session_count_and_length", 
"session_count_and_length");

An then just after I try to read from this file using the Hadoop API:

       FSDataInputStream is;
       Path filePath = new Path("session_count_and_length");
       Path partPath = new Path(path + "/part-00000");
       is = Analytics.hadoopFs.open(partPath);

I RANDOMLY got the following exception:

java.io.FileNotFoundException: File 
app1_stats/session_count_and_length/part-00000 does not exist.
	at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
	at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
	at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)

And when I check my Haddop FS, the file actually exist.

It seems that there is a race condition there between PIG creating 
this file, returning and Hadoop considering this file as existing.

Any suggestion ?

Thanks a lot.

Re: Synchronization issue while storing a file and accessing it using hadoop API

Posted by Vincent Barat <vi...@ubikod.com>.
Forget about this... I'm ashamed to say that it was an Hadoop 
configuration issue :-)

Vincent Barat a écrit :
> Hello to all of you,
> 
> I have some PIG code I run from Java that store a file on Hadoop:
> 
>       Analytics.pigServer.store("session_count_and_length", 
> "session_count_and_length");
> 
> An then just after I try to read from this file using the Hadoop API:
> 
>       FSDataInputStream is;
>       Path filePath = new Path("session_count_and_length");
>       Path partPath = new Path(path + "/part-00000");
>       is = Analytics.hadoopFs.open(partPath);
> 
> I RANDOMLY got the following exception:
> 
> java.io.FileNotFoundException: File 
> app1_stats/session_count_and_length/part-00000 does not exist.
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) 
> 
>     at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) 
> 
>     at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125) 
> 
>     at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
> 
> And when I check my Haddop FS, the file actually exist.
> 
> It seems that there is a race condition there between PIG creating this 
> file, returning and Hadoop considering this file as existing.
> 
> Any suggestion ?
> 
> Thanks a lot.