You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kenji <ke...@gmail.com> on 2015/09/28 05:48:29 UTC
LoadFunction UDF
Hi guys,
I need to create a UDF that defines custom load location for example:
before attempting UDF i tried to do parameter substitution inside of pig
script which does not work:
--myscript.pig
time = LOAD 'hdfs:/home/raw/report/last_process_time/part-r-00000' AS DATE;
start_ts = foreach time generate startTS(DATE);
raw = LOAD '/home/raw/report/$END' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS
(json:map[]);
run -param PATH='/home/raw/reports/$END/*'
hdfs:/home/ridwan/pig-script/update_test.pig
expecting PATH would become the content of start_ts.
so here's an attempt to a solution that i have in mind:
- creating a customLoad() UDF that accept a tuple as input:
-constructor
public customLoad(Tuple input) throws ExecException {
String str = input.get(0).toString();
Date date = new Date(((Long.parseLong(str) * 1000)) + (60 * 60
* 1000));
SimpleDateFormat sdf = new SimpleDateFormat("YYYY/MM/dd/HH");
newpath = sdf.format(date);
}
and updating path's location assuming default location is /home/raw/report
@Override
public void setLocation(String location, Job job) throws IOException {
FileInputFormat.setInputPaths(job, location + newpath + "/*");
}
raw = LOAD '/home/raw/report/' USING customLoad(start_ts);
But this gives me an error:
ERROR 1200: <line 7, column 51> mismatched input 'start_ts' expecting
RIGHT_PAREN
I wonder what have i done wrong?
Thanks alot,
Kenji