You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kenji <ke...@gmail.com> on 2015/09/28 05:48:29 UTC

LoadFunction UDF

Hi guys,

I need to create a UDF that defines custom load location for example:

before attempting UDF i tried to do parameter substitution inside of pig 
script which does not work:
--myscript.pig
time = LOAD 'hdfs:/home/raw/report/last_process_time/part-r-00000' AS DATE;
start_ts = foreach time generate startTS(DATE);
raw = LOAD '/home/raw/report/$END' USING 
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS 
(json:map[]);

run -param PATH='/home/raw/reports/$END/*' 
hdfs:/home/ridwan/pig-script/update_test.pig

expecting PATH would become the content of start_ts.

so here's an attempt to a solution that i have in mind:
- creating a customLoad() UDF that accept a tuple as input:
     -constructor
     public customLoad(Tuple input) throws ExecException {
         String str = input.get(0).toString();
         Date date = new Date(((Long.parseLong(str) * 1000)) + (60 * 60 
* 1000));
         SimpleDateFormat sdf = new SimpleDateFormat("YYYY/MM/dd/HH");
         newpath = sdf.format(date);
     }

and updating path's location assuming default location is /home/raw/report

@Override
     public void setLocation(String location, Job job) throws IOException {
         FileInputFormat.setInputPaths(job, location + newpath + "/*");
     }

raw = LOAD '/home/raw/report/' USING customLoad(start_ts);

But this gives me an error:
ERROR 1200: <line 7, column 51>  mismatched input 'start_ts' expecting 
RIGHT_PAREN

I wonder what have i done wrong?

Thanks alot,
Kenji