You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/01/17 03:07:23 UTC

modifying existing wordcount example

Hi,
  In the wordcount example:
http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
Lets say I run the above example and save the the output.
But lets say that I have now a new input file. What I want to do is..
basically again do the wordcount but basically modifying the previous
counts.
For example..
sample_input1.txt  //foo bar foo bar bar bar
After first run:
1) foo 2
2) bar 4

Save it in output1.txt

Now sample_input2.txt //bar hello world
Now the result I am looking for is:
1)foo 2
2)bar 5
3) hello 1
4) world 1

How do i achieve this in map reduce?

Re: modifying existing wordcount example

Posted by jamal sasha <ja...@gmail.com>.

Thanks for the help.
I just implemented it as suggested. I am processing the new file and then
joining it with previous results. but can i modify the original document
with updated counts plus new word counts.
so my inputs are step1_word_count_output.txt + new_raw_input
The output I want is saved in step1_word_count_output.txt
Which is to say, that I just want to have one output file?



On Wed, Jan 16, 2013 at 7:30 PM, <be...@gmail.com> wrote:

> **
> Hi Jamal
>
> You can use Distributed Cache only if the file to be distributed is small.
> Mapreduce should be dealing with larger datasets so you should expect the
> output file to get larger.
>
> In simple straight forward manner. You can get the second data set
> processed then merge the fist output with second output, you can use
> KeyValueInputFormat to load the outputs to second MR job.
>
> Else you can use multple Inputs here and process the new input file into
> 'word 1' and the previous output file as 'word $count' in the mapper and do
> its aggregation in the reducer.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * jamal sasha <ja...@gmail.com>
> *Date: *Wed, 16 Jan 2013 18:54:04 -0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>; <ch...@embree.us>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: modifying existing wordcount example
>
> Hi,
>  Thanks for giving your thoughts.
> I was reading some libraries in hadoop.. and i feel like distributed cache
> might help me.
> but i picked up hadoop very recently (and along it java as well) and i am
> not able to think of how to actually code :(
>
>
> On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:
>
>> Can you instead copy intput1 and input2 together?
>>
>> Or process both files on the second pass?
>>
>> Otherwise, you'll have to read in output file, load the values and start
>> your map/red job.
>>
>> Probably someone else will have a better answer. :)
>>
>>
>> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>>
>>> Hi,
>>>   In the wordcount example:
>>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>>  Lets say I run the above example and save the the output.
>>> But lets say that I have now a new input file. What I want to do is..
>>> basically again do the wordcount but basically modifying the previous
>>> counts.
>>> For example..
>>> sample_input1.txt  //foo bar foo bar bar bar
>>> After first run:
>>> 1) foo 2
>>> 2) bar 4
>>>
>>> Save it in output1.txt
>>>
>>> Now sample_input2.txt //bar hello world
>>> Now the result I am looking for is:
>>> 1)foo 2
>>> 2)bar 5
>>> 3) hello 1
>>> 4) world 1
>>>
>>> How do i achieve this in map reduce?
>>>
>>>
>>
>

Re: modifying existing wordcount example

Posted by jamal sasha <ja...@gmail.com>.

Thanks for the help.
I just implemented it as suggested. I am processing the new file and then
joining it with previous results. but can i modify the original document
with updated counts plus new word counts.
so my inputs are step1_word_count_output.txt + new_raw_input
The output I want is saved in step1_word_count_output.txt
Which is to say, that I just want to have one output file?



On Wed, Jan 16, 2013 at 7:30 PM, <be...@gmail.com> wrote:

> **
> Hi Jamal
>
> You can use Distributed Cache only if the file to be distributed is small.
> Mapreduce should be dealing with larger datasets so you should expect the
> output file to get larger.
>
> In simple straight forward manner. You can get the second data set
> processed then merge the fist output with second output, you can use
> KeyValueInputFormat to load the outputs to second MR job.
>
> Else you can use multple Inputs here and process the new input file into
> 'word 1' and the previous output file as 'word $count' in the mapper and do
> its aggregation in the reducer.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * jamal sasha <ja...@gmail.com>
> *Date: *Wed, 16 Jan 2013 18:54:04 -0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>; <ch...@embree.us>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: modifying existing wordcount example
>
> Hi,
>  Thanks for giving your thoughts.
> I was reading some libraries in hadoop.. and i feel like distributed cache
> might help me.
> but i picked up hadoop very recently (and along it java as well) and i am
> not able to think of how to actually code :(
>
>
> On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:
>
>> Can you instead copy intput1 and input2 together?
>>
>> Or process both files on the second pass?
>>
>> Otherwise, you'll have to read in output file, load the values and start
>> your map/red job.
>>
>> Probably someone else will have a better answer. :)
>>
>>
>> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>>
>>> Hi,
>>>   In the wordcount example:
>>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>>  Lets say I run the above example and save the the output.
>>> But lets say that I have now a new input file. What I want to do is..
>>> basically again do the wordcount but basically modifying the previous
>>> counts.
>>> For example..
>>> sample_input1.txt  //foo bar foo bar bar bar
>>> After first run:
>>> 1) foo 2
>>> 2) bar 4
>>>
>>> Save it in output1.txt
>>>
>>> Now sample_input2.txt //bar hello world
>>> Now the result I am looking for is:
>>> 1)foo 2
>>> 2)bar 5
>>> 3) hello 1
>>> 4) world 1
>>>
>>> How do i achieve this in map reduce?
>>>
>>>
>>
>

Re: modifying existing wordcount example

Posted by jamal sasha <ja...@gmail.com>.

Thanks for the help.
I just implemented it as suggested. I am processing the new file and then
joining it with previous results. but can i modify the original document
with updated counts plus new word counts.
so my inputs are step1_word_count_output.txt + new_raw_input
The output I want is saved in step1_word_count_output.txt
Which is to say, that I just want to have one output file?



On Wed, Jan 16, 2013 at 7:30 PM, <be...@gmail.com> wrote:

> **
> Hi Jamal
>
> You can use Distributed Cache only if the file to be distributed is small.
> Mapreduce should be dealing with larger datasets so you should expect the
> output file to get larger.
>
> In simple straight forward manner. You can get the second data set
> processed then merge the fist output with second output, you can use
> KeyValueInputFormat to load the outputs to second MR job.
>
> Else you can use multple Inputs here and process the new input file into
> 'word 1' and the previous output file as 'word $count' in the mapper and do
> its aggregation in the reducer.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * jamal sasha <ja...@gmail.com>
> *Date: *Wed, 16 Jan 2013 18:54:04 -0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>; <ch...@embree.us>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: modifying existing wordcount example
>
> Hi,
>  Thanks for giving your thoughts.
> I was reading some libraries in hadoop.. and i feel like distributed cache
> might help me.
> but i picked up hadoop very recently (and along it java as well) and i am
> not able to think of how to actually code :(
>
>
> On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:
>
>> Can you instead copy intput1 and input2 together?
>>
>> Or process both files on the second pass?
>>
>> Otherwise, you'll have to read in output file, load the values and start
>> your map/red job.
>>
>> Probably someone else will have a better answer. :)
>>
>>
>> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>>
>>> Hi,
>>>   In the wordcount example:
>>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>>  Lets say I run the above example and save the the output.
>>> But lets say that I have now a new input file. What I want to do is..
>>> basically again do the wordcount but basically modifying the previous
>>> counts.
>>> For example..
>>> sample_input1.txt  //foo bar foo bar bar bar
>>> After first run:
>>> 1) foo 2
>>> 2) bar 4
>>>
>>> Save it in output1.txt
>>>
>>> Now sample_input2.txt //bar hello world
>>> Now the result I am looking for is:
>>> 1)foo 2
>>> 2)bar 5
>>> 3) hello 1
>>> 4) world 1
>>>
>>> How do i achieve this in map reduce?
>>>
>>>
>>
>

Re: modifying existing wordcount example

Posted by jamal sasha <ja...@gmail.com>.

Thanks for the help.
I just implemented it as suggested. I am processing the new file and then
joining it with previous results. but can i modify the original document
with updated counts plus new word counts.
so my inputs are step1_word_count_output.txt + new_raw_input
The output I want is saved in step1_word_count_output.txt
Which is to say, that I just want to have one output file?



On Wed, Jan 16, 2013 at 7:30 PM, <be...@gmail.com> wrote:

> **
> Hi Jamal
>
> You can use Distributed Cache only if the file to be distributed is small.
> Mapreduce should be dealing with larger datasets so you should expect the
> output file to get larger.
>
> In simple straight forward manner. You can get the second data set
> processed then merge the fist output with second output, you can use
> KeyValueInputFormat to load the outputs to second MR job.
>
> Else you can use multple Inputs here and process the new input file into
> 'word 1' and the previous output file as 'word $count' in the mapper and do
> its aggregation in the reducer.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * jamal sasha <ja...@gmail.com>
> *Date: *Wed, 16 Jan 2013 18:54:04 -0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>; <ch...@embree.us>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: modifying existing wordcount example
>
> Hi,
>  Thanks for giving your thoughts.
> I was reading some libraries in hadoop.. and i feel like distributed cache
> might help me.
> but i picked up hadoop very recently (and along it java as well) and i am
> not able to think of how to actually code :(
>
>
> On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:
>
>> Can you instead copy intput1 and input2 together?
>>
>> Or process both files on the second pass?
>>
>> Otherwise, you'll have to read in output file, load the values and start
>> your map/red job.
>>
>> Probably someone else will have a better answer. :)
>>
>>
>> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>>
>>> Hi,
>>>   In the wordcount example:
>>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>>  Lets say I run the above example and save the the output.
>>> But lets say that I have now a new input file. What I want to do is..
>>> basically again do the wordcount but basically modifying the previous
>>> counts.
>>> For example..
>>> sample_input1.txt  //foo bar foo bar bar bar
>>> After first run:
>>> 1) foo 2
>>> 2) bar 4
>>>
>>> Save it in output1.txt
>>>
>>> Now sample_input2.txt //bar hello world
>>> Now the result I am looking for is:
>>> 1)foo 2
>>> 2)bar 5
>>> 3) hello 1
>>> 4) world 1
>>>
>>> How do i achieve this in map reduce?
>>>
>>>
>>
>

Re: modifying existing wordcount example

Posted by be...@gmail.com.

Hi Jamal

You can use Distributed Cache only if the file to be distributed is small. Mapreduce should be dealing with larger datasets so you should expect the output file to get larger.

In simple straight forward manner. You can get the second data set processed then merge the fist output with second output, you can use KeyValueInputFormat to load the outputs to second MR job.

Else you can use multple Inputs here and process the new input file into 'word   1' and the previous output file as 'word  $count' in the mapper and do its aggregation in the reducer.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: jamal sasha <ja...@gmail.com>
Date: Wed, 16 Jan 2013 18:54:04 
To: user@hadoop.apache.org<us...@hadoop.apache.org>; <ch...@embree.us>
Reply-To: user@hadoop.apache.org
Subject: Re: modifying existing wordcount example

Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(

On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Re: modifying existing wordcount example

Posted by be...@gmail.com.

Hi Jamal

You can use Distributed Cache only if the file to be distributed is small. Mapreduce should be dealing with larger datasets so you should expect the output file to get larger.

In simple straight forward manner. You can get the second data set processed then merge the fist output with second output, you can use KeyValueInputFormat to load the outputs to second MR job.

Else you can use multple Inputs here and process the new input file into 'word   1' and the previous output file as 'word  $count' in the mapper and do its aggregation in the reducer.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: jamal sasha <ja...@gmail.com>
Date: Wed, 16 Jan 2013 18:54:04 
To: user@hadoop.apache.org<us...@hadoop.apache.org>; <ch...@embree.us>
Reply-To: user@hadoop.apache.org
Subject: Re: modifying existing wordcount example

Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(

On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Re: modifying existing wordcount example

Posted by be...@gmail.com.

Hi Jamal

You can use Distributed Cache only if the file to be distributed is small. Mapreduce should be dealing with larger datasets so you should expect the output file to get larger.

In simple straight forward manner. You can get the second data set processed then merge the fist output with second output, you can use KeyValueInputFormat to load the outputs to second MR job.

Else you can use multple Inputs here and process the new input file into 'word   1' and the previous output file as 'word  $count' in the mapper and do its aggregation in the reducer.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: jamal sasha <ja...@gmail.com>
Date: Wed, 16 Jan 2013 18:54:04 
To: user@hadoop.apache.org<us...@hadoop.apache.org>; <ch...@embree.us>
Reply-To: user@hadoop.apache.org
Subject: Re: modifying existing wordcount example

Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(

On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Re: modifying existing wordcount example

Posted by be...@gmail.com.

Hi Jamal

You can use Distributed Cache only if the file to be distributed is small. Mapreduce should be dealing with larger datasets so you should expect the output file to get larger.

In simple straight forward manner. You can get the second data set processed then merge the fist output with second output, you can use KeyValueInputFormat to load the outputs to second MR job.

Else you can use multple Inputs here and process the new input file into 'word   1' and the previous output file as 'word  $count' in the mapper and do its aggregation in the reducer.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: jamal sasha <ja...@gmail.com>
Date: Wed, 16 Jan 2013 18:54:04 
To: user@hadoop.apache.org<us...@hadoop.apache.org>; <ch...@embree.us>
Reply-To: user@hadoop.apache.org
Subject: Re: modifying existing wordcount example

Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(

On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Re: modifying existing wordcount example

Posted by jamal sasha <ja...@gmail.com>.

Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(


On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Re: modifying existing wordcount example

Posted by jamal sasha <ja...@gmail.com>.

Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(


On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Re: modifying existing wordcount example

Posted by jamal sasha <ja...@gmail.com>.

Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(


On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Re: modifying existing wordcount example

Posted by jamal sasha <ja...@gmail.com>.

Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(


On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <ce...@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Re: modifying existing wordcount example

Posted by Chris Embree <ce...@gmail.com>.

Can you instead copy intput1 and input2 together?

Or process both files on the second pass?

Otherwise, you'll have to read in output file, load the values and start
your map/red job.

Probably someone else will have a better answer. :)

On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   In the wordcount example:
> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
> Lets say I run the above example and save the the output.
> But lets say that I have now a new input file. What I want to do is..
> basically again do the wordcount but basically modifying the previous
> counts.
> For example..
> sample_input1.txt  //foo bar foo bar bar bar
> After first run:
> 1) foo 2
> 2) bar 4
>
> Save it in output1.txt
>
> Now sample_input2.txt //bar hello world
> Now the result I am looking for is:
> 1)foo 2
> 2)bar 5
> 3) hello 1
> 4) world 1
>
> How do i achieve this in map reduce?
>
>

Re: modifying existing wordcount example

Posted by Chris Embree <ce...@gmail.com>.

Can you instead copy intput1 and input2 together?

Or process both files on the second pass?

Otherwise, you'll have to read in output file, load the values and start
your map/red job.

Probably someone else will have a better answer. :)

On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   In the wordcount example:
> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
> Lets say I run the above example and save the the output.
> But lets say that I have now a new input file. What I want to do is..
> basically again do the wordcount but basically modifying the previous
> counts.
> For example..
> sample_input1.txt  //foo bar foo bar bar bar
> After first run:
> 1) foo 2
> 2) bar 4
>
> Save it in output1.txt
>
> Now sample_input2.txt //bar hello world
> Now the result I am looking for is:
> 1)foo 2
> 2)bar 5
> 3) hello 1
> 4) world 1
>
> How do i achieve this in map reduce?
>
>

Re: modifying existing wordcount example

Posted by Chris Embree <ce...@gmail.com>.

Can you instead copy intput1 and input2 together?

Or process both files on the second pass?

Otherwise, you'll have to read in output file, load the values and start
your map/red job.

Probably someone else will have a better answer. :)

On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   In the wordcount example:
> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
> Lets say I run the above example and save the the output.
> But lets say that I have now a new input file. What I want to do is..
> basically again do the wordcount but basically modifying the previous
> counts.
> For example..
> sample_input1.txt  //foo bar foo bar bar bar
> After first run:
> 1) foo 2
> 2) bar 4
>
> Save it in output1.txt
>
> Now sample_input2.txt //bar hello world
> Now the result I am looking for is:
> 1)foo 2
> 2)bar 5
> 3) hello 1
> 4) world 1
>
> How do i achieve this in map reduce?
>
>

Re: modifying existing wordcount example

Posted by Chris Embree <ce...@gmail.com>.

Can you instead copy intput1 and input2 together?

Or process both files on the second pass?

Otherwise, you'll have to read in output file, load the values and start
your map/red job.

Probably someone else will have a better answer. :)

On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   In the wordcount example:
> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
> Lets say I run the above example and save the the output.
> But lets say that I have now a new input file. What I want to do is..
> basically again do the wordcount but basically modifying the previous
> counts.
> For example..
> sample_input1.txt  //foo bar foo bar bar bar
> After first run:
> 1) foo 2
> 2) bar 4
>
> Save it in output1.txt
>
> Now sample_input2.txt //bar hello world
> Now the result I am looking for is:
> 1)foo 2
> 2)bar 5
> 3) hello 1
> 4) world 1
>
> How do i achieve this in map reduce?
>
>