You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Pradeep Kanchgar <Pr...@lntinfotech.com> on 2014/03/07 13:30:13 UTC
Bloom fiter in reduce side join
Hi,
I'm currently exploring bloom filter. I've gone through most of the blogs on bloom filters and know what it is but still not able to figure out an example on this in case joins.
I've just started out with map reduce programming.
Can anyone help me in implementing bloom filter in the below example(reduce side join)
I'm joining two datasets "Employee(users)" and "Departments" with reduce side join.
2 mappers to read "user(Employee)" records and "Department" records and reducer to join
user(Employee)" records Department records
id, name id, dept name
3738, Richie Gore 3738,Sales
12946,Rony Sam 12946,Marketing
17556,David Gart 3738,Sales
3443,Rachel Smith 3443,Sales
5799,Paul Rosta
My code
Mapper-1 to read user(employee) records
public class UserMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>{
private Text outkey = new Text();
private Text outval = new Text();
private String id, name;
public void map (LongWritable key, Text value, OutputCollector<Text, Text> ouput,Reporter reporter)
throws IOException {
String line = value.toString();
String arryUsers[] = line.split(",");
id = arryUsers[0].trim();
name = arryUsers[1].trim();
outkey.set(id);
outval.set("A"+ name);
ouput.collect(outkey, outval);
}
}
Mapper -2 to read departments records
public class DepartMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
private Text Outk = new Text();
private Text Outv = new Text();
String depid, dep ;
public void map (LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String line = value.toString();
String arryDept[] = line.split(",");
depid = arryDept[0].trim();
dep = arryDept[1].trim();
Outk.set(depid);
Outv.set("B" + dep);
output.collect(Outk, Outv);
}
}
Reducer to join
public class JoinReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
private Text tmp = new Text();
private ArrayList<Text> listA = new ArrayList<Text>();
private ArrayList<Text> listB = new ArrayList<Text>();
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text>output, Reporter reporter) throws IOException {
listA.clear();
listB.clear();
while (values.hasNext()) {
tmp = values.next();
if (tmp.charAt(0) == 'A') {
listA.add(new Text(tmp.toString().substring(1)));
} else if (tmp.charAt(0) == 'B') {
listB.add(new Text(tmp.toString().substring(1)));
}
}
executejoinlogic(output);
}
private void executejoinlogic(OutputCollector<Text, Text> output) throws IOException {
if (!listA.isEmpty() && !listB.isEmpty()) {
for (Text A : listA) {
for (Text B : listB) {
output.collect(A, B);
}
}
}
}
}
Using eclipse IDE for development and connecting to apache Hadoop 1.1.1 release
Thanks & Regards,
Pradeep C Kanchgar
[cid:image001.jpg@01CF3A24.F73E1B70]
________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"