You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nir Zohar <ni...@hotmail.com> on 2009/03/18 12:12:56 UTC

merging files

Hi,

 

I would like your help with the below question.

I have 2 files: file1 (key, value), file2 (only key) and I need to exclude
all records from file1 that these key records not in file2.

1. The output format is key-value, not only keys.

2. The key is not primary key; hence it's not possible to have joined in the
end.

 

Can you assist?

 

Thanks,

Nir.

 

 

Example:

 

file1:

2,1

2,3

2,5

3,1

3,2

4,7

4,9

6,3

 

file2:

4

2

 

Output:

3,1

3,2

6,3

 

 

 


Re: merging files

Posted by Rasit OZDAS <ra...@gmail.com>.
I would use DistributedCache.
Put file2 to distributed cache, but you should read it for every map.
If you find a better solution, please let me know, because I have a similar
issue.

Rasit

2009/3/18 Nir Zohar <ni...@hotmail.com>

> Hi,
>
>
>
> I would like your help with the below question.
>
> I have 2 files: file1 (key, value), file2 (only key) and I need to exclude
> all records from file1 that these key records not in file2.
>
> 1. The output format is key-value, not only keys.
>
> 2. The key is not primary key; hence it's not possible to have joined in
> the
> end.
>
>
>
> Can you assist?
>
>
>
> Thanks,
>
> Nir.
>
>
>
>
>
> Example:
>
>
>
> file1:
>
> 2,1
>
> 2,3
>
> 2,5
>
> 3,1
>
> 3,2
>
> 4,7
>
> 4,9
>
> 6,3
>
>
>
> file2:
>
> 4
>
> 2
>
>
>
> Output:
>
> 3,1
>
> 3,2
>
> 6,3
>
>
>
>
>
>
>
>


-- 
M. Raşit ÖZDAŞ

RE: merging files

Posted by Nir Zohar <ni...@hotmail.com>.
I've hadoop ver. 0.18, it's not support MultipleInputs, but I used job
configuration property "map.input.file" to distinguish between the different
inputs.
The rest of the solution worked great for me, and solved the problem.

Thanks very much.


-----Original Message-----
From: Enis Soztutar [mailto:enis.soz@gmail.com] 
Sent: Wednesday, March 18, 2009 3:07 PM
To: core-user@hadoop.apache.org
Subject: Re: merging files

Use MultipleInputs and use two different mappers for the inputs. map1 
should be IdentityMapper, mapper 2 should output key, value pairs where 
value is a peudo marker value(same for all keys), which marks that the 
value is null/empty. In the reducer just output the key/value pairs 
which does not include the marker value in their values.

in your example suppose that we use -1 as a marker value, then in 
mapper2, the output will be
4, -1
2, -1

and the reducer will get :

2, {1,3,5,-1}
3, {1,2}
4, {7,9,-1}
6, {3}

then reducer will output :

3, 1
3, 2
6, 3



Nir Zohar wrote:
> Hi,
>
>  
>
> I would like your help with the below question.
>
> I have 2 files: file1 (key, value), file2 (only key) and I need to exclude
> all records from file1 that these key records not in file2.
>
> 1. The output format is key-value, not only keys.
>
> 2. The key is not primary key; hence it's not possible to have joined in
the
> end.
>
>  
>
> Can you assist?
>
>  
>
> Thanks,
>
> Nir.
>
>  
>
>  
>
> Example:
>
>  
>
> file1:
>
> 2,1
>
> 2,3
>
> 2,5
>
> 3,1
>
> 3,2
>
> 4,7
>
> 4,9
>
> 6,3
>
>  
>
> file2:
>
> 4
>
> 2
>
>  
>
> Output:
>
> 3,1
>
> 3,2
>
> 6,3
>
>  
>
>  
>
>  
>
>
>   



Re: merging files

Posted by Enis Soztutar <en...@gmail.com>.
Use MultipleInputs and use two different mappers for the inputs. map1 
should be IdentityMapper, mapper 2 should output key, value pairs where 
value is a peudo marker value(same for all keys), which marks that the 
value is null/empty. In the reducer just output the key/value pairs 
which does not include the marker value in their values.

in your example suppose that we use -1 as a marker value, then in 
mapper2, the output will be
4, -1
2, -1

and the reducer will get :

2, {1,3,5,-1}
3, {1,2}
4, {7,9,-1}
6, {3}

then reducer will output :

3, 1
3, 2
6, 3



Nir Zohar wrote:
> Hi,
>
>  
>
> I would like your help with the below question.
>
> I have 2 files: file1 (key, value), file2 (only key) and I need to exclude
> all records from file1 that these key records not in file2.
>
> 1. The output format is key-value, not only keys.
>
> 2. The key is not primary key; hence it's not possible to have joined in the
> end.
>
>  
>
> Can you assist?
>
>  
>
> Thanks,
>
> Nir.
>
>  
>
>  
>
> Example:
>
>  
>
> file1:
>
> 2,1
>
> 2,3
>
> 2,5
>
> 3,1
>
> 3,2
>
> 4,7
>
> 4,9
>
> 6,3
>
>  
>
> file2:
>
> 4
>
> 2
>
>  
>
> Output:
>
> 3,1
>
> 3,2
>
> 6,3
>
>  
>
>  
>
>  
>
>
>