You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cornelio Iñigo <co...@gmail.com> on 2010/12/01 00:46:56 UTC

decide to use Pig

Hi

I'm starting with this of hadoop and Pig, I have to pass a hadoop MapReduce
program that i made to Pig, in the hadoop program I have just a Map function
and on it I perform all the process
that consists to analize some text... to this 9 functions (operators) are
called, this functions run in a secuencial mode (when the first is done, the
second is started and so on), here is how map looks:


        static class Map extends Mapper<LongWritable, Text, Text,
IntWritable>{


                 //declaration of operators or functions
                 Operator1 op1 = new Operator1();
                 Operator2 op2 = new Operator2();
                 Operator3 op3 = new Operator3();
                 ...
                 ...
        /*map function

        */

        public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{

                                //get a row from csv
                                 String line = value.toString();

                               //some code to parse the line
                               ...
                               ...

                             //initialize all the operators if they are not
initialized
                               if( !op1.isInitialized() )
                                        op1.initialize();

                                if( !op2.isInitialized() )
                                        op2.initialize();

                                 ...
                                 ...//and so on with all operators


                                //process each operator
                                op1.process(line);
                                String[] resultOP1 = op1.getResults();

                                op2.process(resultOP1);
                                String[][] resultOP2 = op2.getResults();
                                ...//and so on with all the operators
                                ...

                              //finally collect results
                               String put = "";
                                for( int k = 0 ; k < resultOP9.length ; k++
){
                                   for( int j = 0; j < resultOP9[k].length;
j++ ){

                                        context.write...
                                    }
                                }
                            }
        }
    }



 My question is if its a good idea or if there is a way to pass this type of
program to Pig?

Thanks

-- 
*Cornelio*

Re: decide to use Pig

Posted by Daniel Dai <ji...@yahoo-inc.com>.
Since I am a Pig developer, I will say "do everything Pig" :).

To be frankly, if these 9 functions are all you want, you can easily 
convert them into Pig, but you will not get too much if non of 9 
functions can utilize existing UDFs. Here is one way you can do it:

* Write a UDF LineProcess:
public class LineProcess extends EvalFunc<DataBag> {
    @Override
    public DataBag exec(Tuple in) {
        String line = (String)in.get(0);
        //initialize all the operators if they are not initialized
        if( !op1.isInitialized() )
            op1.initialize();
   
        if( !op2.isInitialized() )
            op2.initialize();
   
        //and so on with all operators
   
        //process each operator
        op1.process(line);
        String[] resultOP1 = op1.getResults();
   
        op2.process(resultOP1);
        String[][] resultOP2 = op2.getResults();
        //and so on with all the operators
     
        DataBag db = new DefaultDataBag();
      
        for (int i=0;i<resultOP9.length;i++) {
            TupleFactory.getInstance().newTuple();
            t.append(resultOP9[i]);
            db.add(t);
        }
        return db;
    }
    @Override
    public Schema outputSchema(Schema input) {
        return new Schema(new 
Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), 
input), DataType.BAG));
    }
}

* Drive it using a Pig script:
a = load '1.txt' as (a0:chararray);
b = foreach a generate flatten(LineProcess(a0));
store b into 'out';

If going forward, you want to use Filter/Join, and other native Pig 
functionality, or if you want to break these 9 functions and combine 
them in a different way, Pig will definitely help.

Daniel

Cornelio Iñigo wrote:
> Hi
>
> I'm starting with this of hadoop and Pig, I have to pass a hadoop MapReduce
> program that i made to Pig, in the hadoop program I have just a Map function
> and on it I perform all the process
> that consists to analize some text... to this 9 functions (operators) are
> called, this functions run in a secuencial mode (when the first is done, the
> second is started and so on), here is how map looks:
>
>
>         static class Map extends Mapper<LongWritable, Text, Text,
> IntWritable>{
>
>
>                  //declaration of operators or functions
>                  Operator1 op1 = new Operator1();
>                  Operator2 op2 = new Operator2();
>                  Operator3 op3 = new Operator3();
>                  ...
>                  ...
>         /*map function
>
>         */
>
>         public void map(LongWritable key, Text value, Context context)
> throws IOException, InterruptedException{
>
>                                 //get a row from csv
>                                  String line = value.toString();
>
>                                //some code to parse the line
>                                ...
>                                ...
>
>                              //initialize all the operators if they are not
> initialized
>                                if( !op1.isInitialized() )
>                                         op1.initialize();
>
>                                 if( !op2.isInitialized() )
>                                         op2.initialize();
>
>                                  ...
>                                  ...//and so on with all operators
>
>
>                                 //process each operator
>                                 op1.process(line);
>                                 String[] resultOP1 = op1.getResults();
>
>                                 op2.process(resultOP1);
>                                 String[][] resultOP2 = op2.getResults();
>                                 ...//and so on with all the operators
>                                 ...
>
>                               //finally collect results
>                                String put = "";
>                                 for( int k = 0 ; k < resultOP9.length ; k++
> ){
>                                    for( int j = 0; j < resultOP9[k].length;
> j++ ){
>
>                                         context.write...
>                                     }
>                                 }
>                             }
>         }
>     }
>
>
>
>  My question is if its a good idea or if there is a way to pass this type of
> program to Pig?
>
> Thanks
>
>