You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Cheolsoo Park <pi...@gmail.com> on 2013/12/15 06:59:52 UTC

Re: Apache Pig UDF and Distributed cache

Did you look at the stack trace in the Pig log file and Hadoop task log?


On Wed, Dec 11, 2013 at 11:12 AM, Sameer Tilak <ss...@live.com> wrote:

> Hi All,
> I am trying to use Distributed cache in my UDF. I have the following file
> in HDFS that I want all my map functions to have available locally:
> hadoop dfs -ls /scratch/-rw-r--r--   1 userid supergroup    size date time
> /scratch/id_lookup
> In My pig script I pass it as a parameter
>
> ProcessedUI = FOREACH A GENERATE myparser.myUDF(param1, param2,
> '/scratch/id_lookup');
> In my UDF inside exec function I do the following:
>  lookup_file = (String)input.get(2);
> I have implemented the getCacheFiles as follows:
> public List<String> getCacheFiles() {            List<String> list = new
> ArrayList<String>(1);            list.add(lookup_file + "#id_lookup");
>        return list;  }
> Now I try to read that file using standard io methods.
> public void VectorizeData (){                    FileReader fr = new
> FileReader("./id_lookup");                    BufferedReader brd = new
> BufferedReader(fr);}
>
> I think I am not using it correctly (may be paths messed up etc.). I get
> the following exception:
> 2013-12-11 11:09:50,821 [JobControl] ERROR
> org.apache.hadoop.security.UserGroupInformation -
> PriviledgedActionException as:userid cause:java.io.FileNotFoundException:
> File does not exist: null2013-12-11 11:09:51,291 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete2013-12-11 11:09:51,301 [main] WARN
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to
> stop immediately on failure.
> Any help on this would be great!
>