You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Pradeep Kanchgar <Pr...@lntinfotech.com> on 2014/03/07 13:30:13 UTC

Bloom fiter in reduce side join

Hi,

I'm currently exploring bloom filter. I've gone through most of the blogs on bloom filters and know what it is but still not able to figure out an example on this in case joins.
I've just started out with map reduce programming.

Can anyone help me in implementing bloom filter in the below example(reduce side join)

I'm joining two datasets "Employee(users)" and "Departments" with reduce side join.

2 mappers to read "user(Employee)" records and "Department" records and reducer to join

user(Employee)" records                                                             Department records
id, name                                                                                              id, dept name

3738, Richie Gore                                                                             3738,Sales
12946,Rony Sam                                                                               12946,Marketing
17556,David Gart                                                                              3738,Sales
3443,Rachel Smith                                                                           3443,Sales
5799,Paul Rosta


My code
Mapper-1 to read user(employee) records

public class UserMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>{

       private Text outkey = new Text();
       private Text outval = new Text();
       private String id, name;

   public void map (LongWritable key, Text value, OutputCollector<Text, Text> ouput,Reporter reporter)
                       throws IOException {

            String line = value.toString();
            String arryUsers[] = line.split(",");
            id = arryUsers[0].trim();
            name = arryUsers[1].trim();

            outkey.set(id);
            outval.set("A"+ name);
            ouput.collect(outkey, outval);

   }

}

Mapper -2 to read departments records

public class DepartMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {

       private Text Outk = new Text();
       private Text Outv = new Text();
       String depid, dep ;

       public void map (LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

              String line = value.toString();
              String arryDept[] = line.split(",");
              depid = arryDept[0].trim();
              dep = arryDept[1].trim();

              Outk.set(depid);
              Outv.set("B" + dep);

           output.collect(Outk, Outv);
       }


}

Reducer to join

public class JoinReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

       private Text tmp = new Text();
       private ArrayList<Text> listA = new ArrayList<Text>();
       private ArrayList<Text> listB = new ArrayList<Text>();

       public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text>output, Reporter reporter) throws IOException {

              listA.clear();
              listB.clear();

              while (values.hasNext()) {

                     tmp = values.next();
                     if (tmp.charAt(0) == 'A') {
                           listA.add(new Text(tmp.toString().substring(1)));
                     } else if (tmp.charAt(0) == 'B') {
                           listB.add(new Text(tmp.toString().substring(1)));
                     }



              }
              executejoinlogic(output);

       }

       private void executejoinlogic(OutputCollector<Text, Text> output) throws IOException {

              if (!listA.isEmpty() && !listB.isEmpty()) {
                     for (Text A : listA) {
                     for (Text B : listB) {
                     output.collect(A, B);
                     }
                     }
              }

       }

}



Using eclipse IDE for development and connecting to apache Hadoop 1.1.1 release

Thanks & Regards,
Pradeep C Kanchgar

[cid:image001.jpg@01CF3A24.F73E1B70]


________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"