You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cornelio Iñigo <co...@gmail.com> on 2010/12/01 00:46:56 UTC
decide to use Pig
Hi
I'm starting with this of hadoop and Pig, I have to pass a hadoop MapReduce
program that i made to Pig, in the hadoop program I have just a Map function
and on it I perform all the process
that consists to analize some text... to this 9 functions (operators) are
called, this functions run in a secuencial mode (when the first is done, the
second is started and so on), here is how map looks:
static class Map extends Mapper<LongWritable, Text, Text,
IntWritable>{
//declaration of operators or functions
Operator1 op1 = new Operator1();
Operator2 op2 = new Operator2();
Operator3 op3 = new Operator3();
...
...
/*map function
*/
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{
//get a row from csv
String line = value.toString();
//some code to parse the line
...
...
//initialize all the operators if they are not
initialized
if( !op1.isInitialized() )
op1.initialize();
if( !op2.isInitialized() )
op2.initialize();
...
...//and so on with all operators
//process each operator
op1.process(line);
String[] resultOP1 = op1.getResults();
op2.process(resultOP1);
String[][] resultOP2 = op2.getResults();
...//and so on with all the operators
...
//finally collect results
String put = "";
for( int k = 0 ; k < resultOP9.length ; k++
){
for( int j = 0; j < resultOP9[k].length;
j++ ){
context.write...
}
}
}
}
}
My question is if its a good idea or if there is a way to pass this type of
program to Pig?
Thanks
--
*Cornelio*
Re: decide to use Pig
Posted by Daniel Dai <ji...@yahoo-inc.com>.
Since I am a Pig developer, I will say "do everything Pig" :).
To be frankly, if these 9 functions are all you want, you can easily
convert them into Pig, but you will not get too much if non of 9
functions can utilize existing UDFs. Here is one way you can do it:
* Write a UDF LineProcess:
public class LineProcess extends EvalFunc<DataBag> {
@Override
public DataBag exec(Tuple in) {
String line = (String)in.get(0);
//initialize all the operators if they are not initialized
if( !op1.isInitialized() )
op1.initialize();
if( !op2.isInitialized() )
op2.initialize();
//and so on with all operators
//process each operator
op1.process(line);
String[] resultOP1 = op1.getResults();
op2.process(resultOP1);
String[][] resultOP2 = op2.getResults();
//and so on with all the operators
DataBag db = new DefaultDataBag();
for (int i=0;i<resultOP9.length;i++) {
TupleFactory.getInstance().newTuple();
t.append(resultOP9[i]);
db.add(t);
}
return db;
}
@Override
public Schema outputSchema(Schema input) {
return new Schema(new
Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
input), DataType.BAG));
}
}
* Drive it using a Pig script:
a = load '1.txt' as (a0:chararray);
b = foreach a generate flatten(LineProcess(a0));
store b into 'out';
If going forward, you want to use Filter/Join, and other native Pig
functionality, or if you want to break these 9 functions and combine
them in a different way, Pig will definitely help.
Daniel
Cornelio Iñigo wrote:
> Hi
>
> I'm starting with this of hadoop and Pig, I have to pass a hadoop MapReduce
> program that i made to Pig, in the hadoop program I have just a Map function
> and on it I perform all the process
> that consists to analize some text... to this 9 functions (operators) are
> called, this functions run in a secuencial mode (when the first is done, the
> second is started and so on), here is how map looks:
>
>
> static class Map extends Mapper<LongWritable, Text, Text,
> IntWritable>{
>
>
> //declaration of operators or functions
> Operator1 op1 = new Operator1();
> Operator2 op2 = new Operator2();
> Operator3 op3 = new Operator3();
> ...
> ...
> /*map function
>
> */
>
> public void map(LongWritable key, Text value, Context context)
> throws IOException, InterruptedException{
>
> //get a row from csv
> String line = value.toString();
>
> //some code to parse the line
> ...
> ...
>
> //initialize all the operators if they are not
> initialized
> if( !op1.isInitialized() )
> op1.initialize();
>
> if( !op2.isInitialized() )
> op2.initialize();
>
> ...
> ...//and so on with all operators
>
>
> //process each operator
> op1.process(line);
> String[] resultOP1 = op1.getResults();
>
> op2.process(resultOP1);
> String[][] resultOP2 = op2.getResults();
> ...//and so on with all the operators
> ...
>
> //finally collect results
> String put = "";
> for( int k = 0 ; k < resultOP9.length ; k++
> ){
> for( int j = 0; j < resultOP9[k].length;
> j++ ){
>
> context.write...
> }
> }
> }
> }
> }
>
>
>
> My question is if its a good idea or if there is a way to pass this type of
> program to Pig?
>
> Thanks
>
>