You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Charles Gonçalves <ch...@gmail.com> on 2011/02/01 15:12:30 UTC

UDF with parameterized constructor in DEFINE statement

Hi Guys,

I'm Have an UDF in which I want to pass a long in a timestamp representation
and get an Date formated with the SimpleDateFormat Class.
I will pass to the UDF constructor  the string format to the sdf object, and
eventualy the timezone if needed.

So I made a class to do that but when I use it on my script I got the error:

ERROR 1000: Error during parsing. Invalid alias: day in {ex_time:
chararray,scBytes: long,fSize: long}
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
alias: day in {ex_time: chararray,scBytes: long,fSize: long}..

What is the best way to parameterize a java UDF ?
What I'm doing wrong?

Thanks!

THE script:

REGISTER MscPigUtils.jar
DEFINE EdgeLoader msc.pig.EdgeLoader();
DEFINE day msc.pig.ExtractTime('dd');
raw = LOAD
'/home/charles/workspace-j2ee/ReportService/src/test/resources/logsSamples/wpc_justAbril.log.gz'
using EdgeLoader;
B = FOREACH raw GENERATE day(ts), scBytes, fSize ;
C = GROUP B BY day;
clients_stats = FOREACH C {
complete_views = FILTER B BY scBytes >= fSize;
 GENERATE FLATTEN(group), COUNT(B), COUNT(complete_views), SUM(B.scBytes);
}
STORE clients_stats into 'dateTransferday';

The Class:

package msc.pig;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;

import msc.misc.TimeUtils;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.log4j.Logger;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;

public class ExtractTime extends EvalFunc<String> {
 private static final Logger logger = Logger.getLogger(ExtractTime.class);
 private static DateFormat utc_df;
 private static Calendar utc_cal;
  public ExtractTime(String format) {
 utc_df =  new SimpleDateFormat(format);
utc_df.setTimeZone(TimeZone.getTimeZone("UTC"));
 utc_cal = Calendar.getInstance();
 utc_cal.setTimeZone(TimeZone.getTimeZone("UTC"));
}
 public ExtractTime(String format,String tz) {
 utc_df =  new SimpleDateFormat(format);
utc_df.setTimeZone(TimeZone.getTimeZone(tz));
 utc_cal = Calendar.getInstance();
 utc_cal.setTimeZone(TimeZone.getTimeZone(tz));
}

@Override
 public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0) {
 return null;
}
 try {
Object object = input.get(0);
 if (object == null) {
return null;
 }
Long ts = ((Long) object);
 utc_cal.setTimeInMillis(ts * 1000);
 return utc_df.format(utc_cal.getTime());
 }catch (Exception e) {
logger.error("Error Parsing date !!",e);
 return null;
}
 }
@Override
 public Schema outputSchema(Schema input) {
return new Schema(new Schema.FieldSchema("ex_time", DataType.CHARARRAY));
 }
}




-- 
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840

Re: UDF with parameterized constructor in DEFINE statement

Posted by Charles Gonçalves <ch...@gmail.com>.
Thank you guys,
In fact, was my bad!
Sorry!

On Tue, Feb 1, 2011 at 4:05 PM, Santhosh Srinivasan <sm...@yahoo-inc.com>wrote:

> The error message is misleading. The user expected 'day' to be the alias
> used for the UDF and not an alias in the schema.
>
> -----Original Message-----
> From: Jonathan Coveney [mailto:jcoveney@gmail.com]
> Sent: Tuesday, February 01, 2011 6:22 AM
> To: user@pig.apache.org
> Subject: Re: UDF with parameterized constructor in DEFINE statement
>
> Ther error, at least following what you posted, is different from what you
> think. The problem is simply that the column "day" doesn't exist. You can
> see in the output that the columns are {ex_time:
> chararray,scBytes: long,fSize: long}. If you want it to be called day, you
> can name it as such with an "as day" or you can channge the schema or you
> can just group by extime. In generral if you get a parsing error that comes
> before errors with the udf itself, as it will try and parse the whole thing
> THEN make the job
>
> Sent via BlackBerry
>
> -----Original Message-----
> From: Charles Gonçalves <ch...@gmail.com>
> Date: Tue, 1 Feb 2011 12:12:30
> To: <us...@pig.apache.org>
> Reply-To: user@pig.apache.org
> Subject: UDF with parameterized constructor in DEFINE statement
>
> Hi Guys,
>
> I'm Have an UDF in which I want to pass a long in a timestamp
> representation and get an Date formated with the SimpleDateFormat Class.
> I will pass to the UDF constructor  the string format to the sdf object,
> and eventualy the timezone if needed.
>
> So I made a class to do that but when I use it on my script I got the
> error:
>
> ERROR 1000: Error during parsing. Invalid alias: day in {ex_time:
> chararray,scBytes: long,fSize: long}
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
> alias: day in {ex_time: chararray,scBytes: long,fSize: long}..
>
> What is the best way to parameterize a java UDF ?
> What I'm doing wrong?
>
> Thanks!
>
> THE script:
>
> REGISTER MscPigUtils.jar
> DEFINE EdgeLoader msc.pig.EdgeLoader();
> DEFINE day msc.pig.ExtractTime('dd');
> raw = LOAD
>
> '/home/charles/workspace-j2ee/ReportService/src/test/resources/logsSamples/wpc_justAbril.log.gz'
> using EdgeLoader;
> B = FOREACH raw GENERATE day(ts), scBytes, fSize ; C = GROUP B BY day;
> clients_stats = FOREACH C { complete_views = FILTER B BY scBytes >= fSize;
>  GENERATE FLATTEN(group), COUNT(B), COUNT(complete_views), SUM(B.scBytes); }
> STORE clients_stats into 'dateTransferday';
>
> The Class:
>
> package msc.pig;
>
> import java.io.IOException;
> import java.text.DateFormat;
> import java.text.SimpleDateFormat;
> import java.util.Calendar;
> import java.util.TimeZone;
>
> import msc.misc.TimeUtils;
>
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
> import org.apache.log4j.Logger;
> import org.apache.pig.EvalFunc;
> import org.apache.pig.data.DataType;
> import org.apache.pig.data.Tuple;
> import org.apache.pig.impl.logicalLayer.schema.Schema;
> import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;
>
> public class ExtractTime extends EvalFunc<String> {  private static final
> Logger logger = Logger.getLogger(ExtractTime.class);
>  private static DateFormat utc_df;
>  private static Calendar utc_cal;
>  public ExtractTime(String format) {
>  utc_df =  new SimpleDateFormat(format);
> utc_df.setTimeZone(TimeZone.getTimeZone("UTC"));
>  utc_cal = Calendar.getInstance();
>  utc_cal.setTimeZone(TimeZone.getTimeZone("UTC"));
> }
>  public ExtractTime(String format,String tz) {  utc_df =  new
> SimpleDateFormat(format); utc_df.setTimeZone(TimeZone.getTimeZone(tz));
>  utc_cal = Calendar.getInstance();
>  utc_cal.setTimeZone(TimeZone.getTimeZone(tz));
> }
>
> @Override
>  public String exec(Tuple input) throws IOException { if (input == null ||
> input.size() == 0) {  return null; }  try { Object object = input.get(0);
>  if (object == null) { return null;  } Long ts = ((Long) object);
>  utc_cal.setTimeInMillis(ts * 1000);  return
> utc_df.format(utc_cal.getTime());  }catch (Exception e) {
> logger.error("Error Parsing date !!",e);  return null; }  } @Override
>  public Schema outputSchema(Schema input) { return new Schema(new
> Schema.FieldSchema("ex_time", DataType.CHARARRAY));  } }
>
>
>
>
> --
> *Charles Ferreira Gonçalves *
> http://homepages.dcc.ufmg.br/~charles/
> UFMG - ICEx - Dcc
> Cel.: 55 31 87741485
> Tel.:  55 31 34741485
> Lab.: 55 31 34095840
>
>


-- 
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840

RE: UDF with parameterized constructor in DEFINE statement

Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
The error message is misleading. The user expected 'day' to be the alias used for the UDF and not an alias in the schema.

-----Original Message-----
From: Jonathan Coveney [mailto:jcoveney@gmail.com] 
Sent: Tuesday, February 01, 2011 6:22 AM
To: user@pig.apache.org
Subject: Re: UDF with parameterized constructor in DEFINE statement

Ther error, at least following what you posted, is different from what you think. The problem is simply that the column "day" doesn't exist. You can see in the output that the columns are {ex_time:
chararray,scBytes: long,fSize: long}. If you want it to be called day, you can name it as such with an "as day" or you can channge the schema or you can just group by extime. In generral if you get a parsing error that comes before errors with the udf itself, as it will try and parse the whole thing THEN make the job

Sent via BlackBerry

-----Original Message-----
From: Charles Gonçalves <ch...@gmail.com>
Date: Tue, 1 Feb 2011 12:12:30
To: <us...@pig.apache.org>
Reply-To: user@pig.apache.org
Subject: UDF with parameterized constructor in DEFINE statement

Hi Guys,

I'm Have an UDF in which I want to pass a long in a timestamp representation and get an Date formated with the SimpleDateFormat Class.
I will pass to the UDF constructor  the string format to the sdf object, and eventualy the timezone if needed.

So I made a class to do that but when I use it on my script I got the error:

ERROR 1000: Error during parsing. Invalid alias: day in {ex_time:
chararray,scBytes: long,fSize: long}
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
alias: day in {ex_time: chararray,scBytes: long,fSize: long}..

What is the best way to parameterize a java UDF ?
What I'm doing wrong?

Thanks!

THE script:

REGISTER MscPigUtils.jar
DEFINE EdgeLoader msc.pig.EdgeLoader();
DEFINE day msc.pig.ExtractTime('dd');
raw = LOAD
'/home/charles/workspace-j2ee/ReportService/src/test/resources/logsSamples/wpc_justAbril.log.gz'
using EdgeLoader;
B = FOREACH raw GENERATE day(ts), scBytes, fSize ; C = GROUP B BY day; clients_stats = FOREACH C { complete_views = FILTER B BY scBytes >= fSize;  GENERATE FLATTEN(group), COUNT(B), COUNT(complete_views), SUM(B.scBytes); } STORE clients_stats into 'dateTransferday';

The Class:

package msc.pig;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;

import msc.misc.TimeUtils;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.log4j.Logger;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;

public class ExtractTime extends EvalFunc<String> {  private static final Logger logger = Logger.getLogger(ExtractTime.class);
 private static DateFormat utc_df;
 private static Calendar utc_cal;
  public ExtractTime(String format) {
 utc_df =  new SimpleDateFormat(format); utc_df.setTimeZone(TimeZone.getTimeZone("UTC"));
 utc_cal = Calendar.getInstance();
 utc_cal.setTimeZone(TimeZone.getTimeZone("UTC"));
}
 public ExtractTime(String format,String tz) {  utc_df =  new SimpleDateFormat(format); utc_df.setTimeZone(TimeZone.getTimeZone(tz));
 utc_cal = Calendar.getInstance();
 utc_cal.setTimeZone(TimeZone.getTimeZone(tz));
}

@Override
 public String exec(Tuple input) throws IOException { if (input == null || input.size() == 0) {  return null; }  try { Object object = input.get(0);  if (object == null) { return null;  } Long ts = ((Long) object);  utc_cal.setTimeInMillis(ts * 1000);  return utc_df.format(utc_cal.getTime());  }catch (Exception e) { logger.error("Error Parsing date !!",e);  return null; }  } @Override  public Schema outputSchema(Schema input) { return new Schema(new Schema.FieldSchema("ex_time", DataType.CHARARRAY));  } }




--
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840


Re: UDF with parameterized constructor in DEFINE statement

Posted by Jonathan Coveney <jc...@gmail.com>.
Ther error, at least following what you posted, is different from what you think. The problem is simply that the column "day" doesn't exist. You can see in the output that the columns are {ex_time:
chararray,scBytes: long,fSize: long}. If you want it to be called day, you can name it as such with an "as day" or you can channge the schema or you can just group by extime. In generral if you get a parsing error that comes before errors with the udf itself, as it will try and parse the whole thing THEN make the job

Sent via BlackBerry

-----Original Message-----
From: Charles Gonçalves <ch...@gmail.com>
Date: Tue, 1 Feb 2011 12:12:30 
To: <us...@pig.apache.org>
Reply-To: user@pig.apache.org
Subject: UDF with parameterized constructor in DEFINE statement

Hi Guys,

I'm Have an UDF in which I want to pass a long in a timestamp representation
and get an Date formated with the SimpleDateFormat Class.
I will pass to the UDF constructor  the string format to the sdf object, and
eventualy the timezone if needed.

So I made a class to do that but when I use it on my script I got the error:

ERROR 1000: Error during parsing. Invalid alias: day in {ex_time:
chararray,scBytes: long,fSize: long}
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
alias: day in {ex_time: chararray,scBytes: long,fSize: long}..

What is the best way to parameterize a java UDF ?
What I'm doing wrong?

Thanks!

THE script:

REGISTER MscPigUtils.jar
DEFINE EdgeLoader msc.pig.EdgeLoader();
DEFINE day msc.pig.ExtractTime('dd');
raw = LOAD
'/home/charles/workspace-j2ee/ReportService/src/test/resources/logsSamples/wpc_justAbril.log.gz'
using EdgeLoader;
B = FOREACH raw GENERATE day(ts), scBytes, fSize ;
C = GROUP B BY day;
clients_stats = FOREACH C {
complete_views = FILTER B BY scBytes >= fSize;
 GENERATE FLATTEN(group), COUNT(B), COUNT(complete_views), SUM(B.scBytes);
}
STORE clients_stats into 'dateTransferday';

The Class:

package msc.pig;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;

import msc.misc.TimeUtils;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.log4j.Logger;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;

public class ExtractTime extends EvalFunc<String> {
 private static final Logger logger = Logger.getLogger(ExtractTime.class);
 private static DateFormat utc_df;
 private static Calendar utc_cal;
  public ExtractTime(String format) {
 utc_df =  new SimpleDateFormat(format);
utc_df.setTimeZone(TimeZone.getTimeZone("UTC"));
 utc_cal = Calendar.getInstance();
 utc_cal.setTimeZone(TimeZone.getTimeZone("UTC"));
}
 public ExtractTime(String format,String tz) {
 utc_df =  new SimpleDateFormat(format);
utc_df.setTimeZone(TimeZone.getTimeZone(tz));
 utc_cal = Calendar.getInstance();
 utc_cal.setTimeZone(TimeZone.getTimeZone(tz));
}

@Override
 public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0) {
 return null;
}
 try {
Object object = input.get(0);
 if (object == null) {
return null;
 }
Long ts = ((Long) object);
 utc_cal.setTimeInMillis(ts * 1000);
 return utc_df.format(utc_cal.getTime());
 }catch (Exception e) {
logger.error("Error Parsing date !!",e);
 return null;
}
 }
@Override
 public Schema outputSchema(Schema input) {
return new Schema(new Schema.FieldSchema("ex_time", DataType.CHARARRAY));
 }
}




-- 
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840