You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Khanh Nguyen <kn...@cs.umb.edu> on 2008/07/14 20:19:11 UTC

multiple Output Collectors ?

Hello,

Is it possible to have more than one output collector for one map?

My input are records of html pages. I am mapping each url to its
html-content and want to have two output collectors. One that maps
each <url, html-content> --> <url, outlinks> and another one that map
<url, html-content> to something else (difficult to explain).

Please help. Thanks

-k

Re: multiple Output Collectors ?

Posted by Khanh Nguyen <kn...@cs.umb.edu>.
I was using Hadoop in stand-alone mode. Nope, assume my collector name
is 'text', my 'text-r-0000' is not empty, but the 'text-r-0000.crc' is
empty.

-k

On Wed, Jul 30, 2008 at 4:10 AM, Alejandro Abdelnur <tu...@gmail.com> wrote:
> Are you seeing in local FS or in HDFS?
>
> In local FS you'll see them. In HDFS you should not see any (via
> hadoop dfs -ls).
>
> As far as I understand, check-sum files are empty the corresponding
> files are empty.
>
> A
>
> On Mon, Jul 21, 2008 at 3:11 AM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>> Hello,
>>
>> Is there any reason that the check-sum file for  a multipleoutput's
>> collectors is empty?
>>
>> -k
>>
>> On Wed, Jul 16, 2008 at 4:26 AM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>>> multiple mappers mean multiple jobs, which means you'll have to run 2
>>> jobs on the same data, with the MultipleOutputs and
>>> MultipleOutputFormat you can do that in one pass form a single Mapper.
>>>
>>> On Wed, Jul 16, 2008 at 3:26 AM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>>>> Thank you very much. Someone suggested I could just use multiple
>>>> mapper. Would that work better (easier?) ?
>>>>
>>>> -k
>>>>
>>>> On Mon, Jul 14, 2008 at 11:59 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>>>>> check MultipleOutputFormat and MultipleOutputs (this has been
>>>>> committed to the trunk last week)
>>>>>
>>>>>
>>>>> On Mon, Jul 14, 2008 at 11:49 PM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Is it possible to have more than one output collector for one map?
>>>>>>
>>>>>> My input are records of html pages. I am mapping each url to its
>>>>>> html-content and want to have two output collectors. One that maps
>>>>>> each <url, html-content> --> <url, outlinks> and another one that map
>>>>>> <url, html-content> to something else (difficult to explain).
>>>>>>
>>>>>> Please help. Thanks
>>>>>>
>>>>>> -k
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: multiple Output Collectors ?

Posted by Alejandro Abdelnur <tu...@gmail.com>.
Are you seeing in local FS or in HDFS?

In local FS you'll see them. In HDFS you should not see any (via
hadoop dfs -ls).

As far as I understand, check-sum files are empty the corresponding
files are empty.

A

On Mon, Jul 21, 2008 at 3:11 AM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
> Hello,
>
> Is there any reason that the check-sum file for  a multipleoutput's
> collectors is empty?
>
> -k
>
> On Wed, Jul 16, 2008 at 4:26 AM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>> multiple mappers mean multiple jobs, which means you'll have to run 2
>> jobs on the same data, with the MultipleOutputs and
>> MultipleOutputFormat you can do that in one pass form a single Mapper.
>>
>> On Wed, Jul 16, 2008 at 3:26 AM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>>> Thank you very much. Someone suggested I could just use multiple
>>> mapper. Would that work better (easier?) ?
>>>
>>> -k
>>>
>>> On Mon, Jul 14, 2008 at 11:59 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>>>> check MultipleOutputFormat and MultipleOutputs (this has been
>>>> committed to the trunk last week)
>>>>
>>>>
>>>> On Mon, Jul 14, 2008 at 11:49 PM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>>>>> Hello,
>>>>>
>>>>> Is it possible to have more than one output collector for one map?
>>>>>
>>>>> My input are records of html pages. I am mapping each url to its
>>>>> html-content and want to have two output collectors. One that maps
>>>>> each <url, html-content> --> <url, outlinks> and another one that map
>>>>> <url, html-content> to something else (difficult to explain).
>>>>>
>>>>> Please help. Thanks
>>>>>
>>>>> -k
>>>>>
>>>>
>>>
>>
>

Re: multiple Output Collectors ?

Posted by Khanh Nguyen <kn...@cs.umb.edu>.
Hello,

Is there any reason that the check-sum file for  a multipleoutput's
collectors is empty?

-k

On Wed, Jul 16, 2008 at 4:26 AM, Alejandro Abdelnur <tu...@gmail.com> wrote:
> multiple mappers mean multiple jobs, which means you'll have to run 2
> jobs on the same data, with the MultipleOutputs and
> MultipleOutputFormat you can do that in one pass form a single Mapper.
>
> On Wed, Jul 16, 2008 at 3:26 AM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>> Thank you very much. Someone suggested I could just use multiple
>> mapper. Would that work better (easier?) ?
>>
>> -k
>>
>> On Mon, Jul 14, 2008 at 11:59 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>>> check MultipleOutputFormat and MultipleOutputs (this has been
>>> committed to the trunk last week)
>>>
>>>
>>> On Mon, Jul 14, 2008 at 11:49 PM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>>>> Hello,
>>>>
>>>> Is it possible to have more than one output collector for one map?
>>>>
>>>> My input are records of html pages. I am mapping each url to its
>>>> html-content and want to have two output collectors. One that maps
>>>> each <url, html-content> --> <url, outlinks> and another one that map
>>>> <url, html-content> to something else (difficult to explain).
>>>>
>>>> Please help. Thanks
>>>>
>>>> -k
>>>>
>>>
>>
>

Re: multiple Output Collectors ?

Posted by Alejandro Abdelnur <tu...@gmail.com>.
multiple mappers mean multiple jobs, which means you'll have to run 2
jobs on the same data, with the MultipleOutputs and
MultipleOutputFormat you can do that in one pass form a single Mapper.

On Wed, Jul 16, 2008 at 3:26 AM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
> Thank you very much. Someone suggested I could just use multiple
> mapper. Would that work better (easier?) ?
>
> -k
>
> On Mon, Jul 14, 2008 at 11:59 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>> check MultipleOutputFormat and MultipleOutputs (this has been
>> committed to the trunk last week)
>>
>>
>> On Mon, Jul 14, 2008 at 11:49 PM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>>> Hello,
>>>
>>> Is it possible to have more than one output collector for one map?
>>>
>>> My input are records of html pages. I am mapping each url to its
>>> html-content and want to have two output collectors. One that maps
>>> each <url, html-content> --> <url, outlinks> and another one that map
>>> <url, html-content> to something else (difficult to explain).
>>>
>>> Please help. Thanks
>>>
>>> -k
>>>
>>
>

Re: multiple Output Collectors ?

Posted by Khanh Nguyen <kn...@cs.umb.edu>.
Thank you very much. Someone suggested I could just use multiple
mapper. Would that work better (easier?) ?

-k

On Mon, Jul 14, 2008 at 11:59 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
> check MultipleOutputFormat and MultipleOutputs (this has been
> committed to the trunk last week)
>
>
> On Mon, Jul 14, 2008 at 11:49 PM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
>> Hello,
>>
>> Is it possible to have more than one output collector for one map?
>>
>> My input are records of html pages. I am mapping each url to its
>> html-content and want to have two output collectors. One that maps
>> each <url, html-content> --> <url, outlinks> and another one that map
>> <url, html-content> to something else (difficult to explain).
>>
>> Please help. Thanks
>>
>> -k
>>
>

Re: multiple Output Collectors ?

Posted by Alejandro Abdelnur <tu...@gmail.com>.
check MultipleOutputFormat and MultipleOutputs (this has been
committed to the trunk last week)


On Mon, Jul 14, 2008 at 11:49 PM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
> Hello,
>
> Is it possible to have more than one output collector for one map?
>
> My input are records of html pages. I am mapping each url to its
> html-content and want to have two output collectors. One that maps
> each <url, html-content> --> <url, outlinks> and another one that map
> <url, html-content> to something else (difficult to explain).
>
> Please help. Thanks
>
> -k
>

Re: multiple Output Collectors ?

Posted by Joman Chu <jo...@andrew.cmu.edu>.
One cheap hack that comes to mind is to extend the GenericWritable and
ArrayWritable classes and write a second and third MapReduce job that
will both parse over your first job's output, and each will select for
the Key-Value pair it wants.

Joman Chu
AIM: ARcanUSNUMquam
IRC: irc.liquid-silver.net


On Mon, Jul 14, 2008 at 2:19 PM, Khanh Nguyen <kn...@cs.umb.edu> wrote:
> Hello,
>
> Is it possible to have more than one output collector for one map?
>
> My input are records of html pages. I am mapping each url to its
> html-content and want to have two output collectors. One that maps
> each <url, html-content> --> <url, outlinks> and another one that map
> <url, html-content> to something else (difficult to explain).
>
> Please help. Thanks
>
> -k
>
>