You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by 皮皮 <pi...@gmail.com> on 2009/05/21 12:09:38 UTC

Re: Multipleoutput file

yes , but how can i get the commaSeperatedPaths? As i can't specify it
handy.

it's not practicable to do that:

commaSeperatedPaths_1 = "MAPPINGOUTPUT-r-00001";
commaSeperatedPaths_2 = "MAPPINGOUTPUT-r-00002";

FileInputFormat.setInputPaths(job, commaSeperatedPaths_1);
FileInputFormat.setInputPaths(job, commaSeperatedPaths_2);



2009/4/7 Brian MacKay <Br...@medecision.com>

>
> Not sure about your question:  seems like you'd like to do this...?
>
> After you run job, your output may be like MAPPINGOUTPUT-r-00001,
> MAPPINGOUTPUT-r-00002, etc.
>
> You'd need to set them as multiple inputs.
>
> FileInputFormat.setInputPaths(job, commaSeperatedPaths);
>
>
> Brian
>
> -----Original Message-----
> From: 皮皮 [mailto:pi.bingfeng@gmail.com]
> Sent: Tuesday, April 07, 2009 3:30 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Multiple k,v pairs from a single map - possible?
>
> could any body tell me how to get one of the multipleoutput file in another
> jobconfig?
>
> 2009/4/3 皮皮 <pi...@gmail.com>
>
> > thank you very much . this is what i am looking for.
> >
> > 2009/3/27 Brian MacKay <Br...@medecision.com>
> >
> >
> >> Amandeep,
> >>
> >> Add this to your driver.....
> >>
> >> MultipleOutputs.addNamedOutput(conf, "PHONE",TextOutputFormat.class,
> >> Text.class, Text.class);
> >>
> >> MultipleOutputs.addNamedOutput(conf, "NAME,
> >>                    TextOutputFormat.class, Text.class, Text.class);
> >>
> >>
> >>
> >> And in your reducer....
> >>
> >>  private MultipleOutputs mos;
> >>
> >> public void reduce(Text key, Iterator<Text> values,
> >>            OutputCollector<Text, Text> output, Reporter reporter) {
> >>
> >>
> >>          // namedOutPut = either PHONE or NAME
> >>
> >>        while (values.hasNext()) {
> >>            String value = values.next().toString();
> >>            mos.getCollector(namedOutPut, reporter).collect(
> >>                    new Text(value), new Text(othervals));
> >>        }
> >>    }
> >>
> >>    @Override
> >>    public void configure(JobConf conf) {
> >>        super.configure(conf);
> >>        mos = new MultipleOutputs(conf);
> >>    }
> >>
> >>    public void close() throws IOException {
> >>        mos.close();
> >>    }
> >>
> >>
> >>
> >> By the way, have you had a change to post your Oracle fix to
> >> DBInputFormat ?
> >> If so, what is the Jira tag #?
> >>
> >> Brian
> >>
> >> -----Original Message-----
> >> From: Amandeep Khurana [mailto:amansk@gmail.com]
> >> Sent: Friday, March 27, 2009 5:46 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: Multiple k,v pairs from a single map - possible?
> >>
> >> Is it possible to output multiple key value pairs from a single map
> >> function
> >> run?
> >>
> >> For example, the mapper outputing <name,phone> and <name, address>
> >> simultaneously...
> >>
> >> Can I write multiple output.collect(...) commands?
> >>
> >> Amandeep
> >>
> >> Amandeep Khurana
> >> Computer Science Graduate Student
> >> University of California, Santa Cruz
> >>
> >>
> >>
> >>
> >>
> >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> >> _
> >>
> >> The information transmitted is intended only for the person or entity to
> >> which it is addressed and may contain confidential and/or privileged
> >> material. Any review, retransmission, dissemination or other use of, or
> >> taking of any action in reliance upon, this information by persons or
> >> entities other than the intended recipient is prohibited. If you
> received
> >> this message in error, please contact the sender and delete the material
> >> from any computer.
> >>
> >>
> >>
> >
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this message in error, please contact the sender and delete the material
> from any computer.
>
>

Re: Multipleoutput file

Posted by 皮皮 <pi...@gmail.com>.
i do it in another method:

    MultipleOutputs.addNamedOutput(job, "delete",
SequenceFileOutputFormat.class, Text.class, IntWritable.class);
    MultipleOutputs.addNamedOutput(job, "compare",
SequenceFileOutputFormat.class, LongWritable.class, IndexDoc.class);


    Path toDelete = null,  toCompare = null;
    FileStatus[] fstats = fs.listStatus(outDir1);
    for( FileStatus file : fstats){
        if( file.getPath().getName().startsWith("delete"))
            toDelete = file.getPath();

        else if( file.getPath().getName().startsWith("compare"))
            toCompare = file.getPath();
    }

 i don't know if it is regular to do this, but it is solvable for me right
now.

2009/5/22 皮皮 <pi...@gmail.com>

> thank you for you reply, jason.
>
> well , how should i do if i just want to get certain file in the directory
> , not all of the files?
>
> 2009/5/21 jason hadoop <ja...@gmail.com>
>
> setInputPaths will take an array, or variable arguments.
>> or you can simply provide the directory that the individual files reside
>> in,
>> and the individual files will be added.
>>
>> If there are other files in the directory, you may need to specify a
>> custom
>> input path filter via FileInputFormat.setInputPathFilter.
>>
>>
>> 2009/5/21 皮皮 <pi...@gmail.com>
>>
>> > yes , but how can i get the commaSeperatedPaths? As i can't specify it
>> > handy.
>> >
>> > it's not practicable to do that:
>> >
>> > commaSeperatedPaths_1 = "MAPPINGOUTPUT-r-00001";
>> > commaSeperatedPaths_2 = "MAPPINGOUTPUT-r-00002";
>> >
>> > FileInputFormat.setInputPaths(job, commaSeperatedPaths_1);
>> > FileInputFormat.setInputPaths(job, commaSeperatedPaths_2);
>> >
>> >
>> >
>> > 2009/4/7 Brian MacKay <Br...@medecision.com>
>> >
>> > >
>> > > Not sure about your question:  seems like you'd like to do this...?
>> > >
>> > > After you run job, your output may be like MAPPINGOUTPUT-r-00001,
>> > > MAPPINGOUTPUT-r-00002, etc.
>> > >
>> > > You'd need to set them as multiple inputs.
>> > >
>> > > FileInputFormat.setInputPaths(job, commaSeperatedPaths);
>> > >
>> > >
>> > > Brian
>> > >
>> > > -----Original Message-----
>> > > From: 皮皮 [mailto:pi.bingfeng@gmail.com]
>> > > Sent: Tuesday, April 07, 2009 3:30 AM
>> > > To: core-user@hadoop.apache.org
>> > > Subject: Re: Multiple k,v pairs from a single map - possible?
>> > >
>> > > could any body tell me how to get one of the multipleoutput file in
>> > another
>> > > jobconfig?
>> > >
>> > > 2009/4/3 皮皮 <pi...@gmail.com>
>> > >
>> > > > thank you very much . this is what i am looking for.
>> > > >
>> > > > 2009/3/27 Brian MacKay <Br...@medecision.com>
>> > > >
>> > > >
>> > > >> Amandeep,
>> > > >>
>> > > >> Add this to your driver.....
>> > > >>
>> > > >> MultipleOutputs.addNamedOutput(conf,
>> "PHONE",TextOutputFormat.class,
>> > > >> Text.class, Text.class);
>> > > >>
>> > > >> MultipleOutputs.addNamedOutput(conf, "NAME,
>> > > >>                    TextOutputFormat.class, Text.class, Text.class);
>> > > >>
>> > > >>
>> > > >>
>> > > >> And in your reducer....
>> > > >>
>> > > >>  private MultipleOutputs mos;
>> > > >>
>> > > >> public void reduce(Text key, Iterator<Text> values,
>> > > >>            OutputCollector<Text, Text> output, Reporter reporter) {
>> > > >>
>> > > >>
>> > > >>          // namedOutPut = either PHONE or NAME
>> > > >>
>> > > >>        while (values.hasNext()) {
>> > > >>            String value = values.next().toString();
>> > > >>            mos.getCollector(namedOutPut, reporter).collect(
>> > > >>                    new Text(value), new Text(othervals));
>> > > >>        }
>> > > >>    }
>> > > >>
>> > > >>    @Override
>> > > >>    public void configure(JobConf conf) {
>> > > >>        super.configure(conf);
>> > > >>        mos = new MultipleOutputs(conf);
>> > > >>    }
>> > > >>
>> > > >>    public void close() throws IOException {
>> > > >>        mos.close();
>> > > >>    }
>> > > >>
>> > > >>
>> > > >>
>> > > >> By the way, have you had a change to post your Oracle fix to
>> > > >> DBInputFormat ?
>> > > >> If so, what is the Jira tag #?
>> > > >>
>> > > >> Brian
>> > > >>
>> > > >> -----Original Message-----
>> > > >> From: Amandeep Khurana [mailto:amansk@gmail.com]
>> > > >> Sent: Friday, March 27, 2009 5:46 AM
>> > > >> To: core-user@hadoop.apache.org
>> > > >> Subject: Multiple k,v pairs from a single map - possible?
>> > > >>
>> > > >> Is it possible to output multiple key value pairs from a single map
>> > > >> function
>> > > >> run?
>> > > >>
>> > > >> For example, the mapper outputing <name,phone> and <name, address>
>> > > >> simultaneously...
>> > > >>
>> > > >> Can I write multiple output.collect(...) commands?
>> > > >>
>> > > >> Amandeep
>> > > >>
>> > > >> Amandeep Khurana
>> > > >> Computer Science Graduate Student
>> > > >> University of California, Santa Cruz
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> _
>> > _
>> > > _
>> > > >> _
>> > > >>
>> > > >> The information transmitted is intended only for the person or
>> entity
>> > to
>> > > >> which it is addressed and may contain confidential and/or
>> privileged
>> > > >> material. Any review, retransmission, dissemination or other use
>> of,
>> > or
>> > > >> taking of any action in reliance upon, this information by persons
>> or
>> > > >> entities other than the intended recipient is prohibited. If you
>> > > received
>> > > >> this message in error, please contact the sender and delete the
>> > material
>> > > >> from any computer.
>> > > >>
>> > > >>
>> > > >>
>> > > >
>> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> _ _
>> > _
>> > >
>> > > The information transmitted is intended only for the person or entity
>> to
>> > > which it is addressed and may contain confidential and/or privileged
>> > > material. Any review, retransmission, dissemination or other use of,
>> or
>> > > taking of any action in reliance upon, this information by persons or
>> > > entities other than the intended recipient is prohibited. If you
>> received
>> > > this message in error, please contact the sender and delete the
>> material
>> > > from any computer.
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>> www.prohadoopbook.com a community for Hadoop Professionals
>>
>
>

Re: Multipleoutput file

Posted by 皮皮 <pi...@gmail.com>.
thank you for you reply, jason.

well , how should i do if i just want to get certain file in the directory ,
not all of the files?

2009/5/21 jason hadoop <ja...@gmail.com>

> setInputPaths will take an array, or variable arguments.
> or you can simply provide the directory that the individual files reside
> in,
> and the individual files will be added.
>
> If there are other files in the directory, you may need to specify a custom
> input path filter via FileInputFormat.setInputPathFilter.
>
>
> 2009/5/21 皮皮 <pi...@gmail.com>
>
> > yes , but how can i get the commaSeperatedPaths? As i can't specify it
> > handy.
> >
> > it's not practicable to do that:
> >
> > commaSeperatedPaths_1 = "MAPPINGOUTPUT-r-00001";
> > commaSeperatedPaths_2 = "MAPPINGOUTPUT-r-00002";
> >
> > FileInputFormat.setInputPaths(job, commaSeperatedPaths_1);
> > FileInputFormat.setInputPaths(job, commaSeperatedPaths_2);
> >
> >
> >
> > 2009/4/7 Brian MacKay <Br...@medecision.com>
> >
> > >
> > > Not sure about your question:  seems like you'd like to do this...?
> > >
> > > After you run job, your output may be like MAPPINGOUTPUT-r-00001,
> > > MAPPINGOUTPUT-r-00002, etc.
> > >
> > > You'd need to set them as multiple inputs.
> > >
> > > FileInputFormat.setInputPaths(job, commaSeperatedPaths);
> > >
> > >
> > > Brian
> > >
> > > -----Original Message-----
> > > From: 皮皮 [mailto:pi.bingfeng@gmail.com]
> > > Sent: Tuesday, April 07, 2009 3:30 AM
> > > To: core-user@hadoop.apache.org
> > > Subject: Re: Multiple k,v pairs from a single map - possible?
> > >
> > > could any body tell me how to get one of the multipleoutput file in
> > another
> > > jobconfig?
> > >
> > > 2009/4/3 皮皮 <pi...@gmail.com>
> > >
> > > > thank you very much . this is what i am looking for.
> > > >
> > > > 2009/3/27 Brian MacKay <Br...@medecision.com>
> > > >
> > > >
> > > >> Amandeep,
> > > >>
> > > >> Add this to your driver.....
> > > >>
> > > >> MultipleOutputs.addNamedOutput(conf, "PHONE",TextOutputFormat.class,
> > > >> Text.class, Text.class);
> > > >>
> > > >> MultipleOutputs.addNamedOutput(conf, "NAME,
> > > >>                    TextOutputFormat.class, Text.class, Text.class);
> > > >>
> > > >>
> > > >>
> > > >> And in your reducer....
> > > >>
> > > >>  private MultipleOutputs mos;
> > > >>
> > > >> public void reduce(Text key, Iterator<Text> values,
> > > >>            OutputCollector<Text, Text> output, Reporter reporter) {
> > > >>
> > > >>
> > > >>          // namedOutPut = either PHONE or NAME
> > > >>
> > > >>        while (values.hasNext()) {
> > > >>            String value = values.next().toString();
> > > >>            mos.getCollector(namedOutPut, reporter).collect(
> > > >>                    new Text(value), new Text(othervals));
> > > >>        }
> > > >>    }
> > > >>
> > > >>    @Override
> > > >>    public void configure(JobConf conf) {
> > > >>        super.configure(conf);
> > > >>        mos = new MultipleOutputs(conf);
> > > >>    }
> > > >>
> > > >>    public void close() throws IOException {
> > > >>        mos.close();
> > > >>    }
> > > >>
> > > >>
> > > >>
> > > >> By the way, have you had a change to post your Oracle fix to
> > > >> DBInputFormat ?
> > > >> If so, what is the Jira tag #?
> > > >>
> > > >> Brian
> > > >>
> > > >> -----Original Message-----
> > > >> From: Amandeep Khurana [mailto:amansk@gmail.com]
> > > >> Sent: Friday, March 27, 2009 5:46 AM
> > > >> To: core-user@hadoop.apache.org
> > > >> Subject: Multiple k,v pairs from a single map - possible?
> > > >>
> > > >> Is it possible to output multiple key value pairs from a single map
> > > >> function
> > > >> run?
> > > >>
> > > >> For example, the mapper outputing <name,phone> and <name, address>
> > > >> simultaneously...
> > > >>
> > > >> Can I write multiple output.collect(...) commands?
> > > >>
> > > >> Amandeep
> > > >>
> > > >> Amandeep Khurana
> > > >> Computer Science Graduate Student
> > > >> University of California, Santa Cruz
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> > _
> > > _
> > > >> _
> > > >>
> > > >> The information transmitted is intended only for the person or
> entity
> > to
> > > >> which it is addressed and may contain confidential and/or privileged
> > > >> material. Any review, retransmission, dissemination or other use of,
> > or
> > > >> taking of any action in reliance upon, this information by persons
> or
> > > >> entities other than the intended recipient is prohibited. If you
> > > received
> > > >> this message in error, please contact the sender and delete the
> > material
> > > >> from any computer.
> > > >>
> > > >>
> > > >>
> > > >
> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> > _
> > >
> > > The information transmitted is intended only for the person or entity
> to
> > > which it is addressed and may contain confidential and/or privileged
> > > material. Any review, retransmission, dissemination or other use of, or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipient is prohibited. If you
> received
> > > this message in error, please contact the sender and delete the
> material
> > > from any computer.
> > >
> > >
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
> www.prohadoopbook.com a community for Hadoop Professionals
>

Re: Multipleoutput file

Posted by jason hadoop <ja...@gmail.com>.
setInputPaths will take an array, or variable arguments.
or you can simply provide the directory that the individual files reside in,
and the individual files will be added.

If there are other files in the directory, you may need to specify a custom
input path filter via FileInputFormat.setInputPathFilter.


2009/5/21 皮皮 <pi...@gmail.com>

> yes , but how can i get the commaSeperatedPaths? As i can't specify it
> handy.
>
> it's not practicable to do that:
>
> commaSeperatedPaths_1 = "MAPPINGOUTPUT-r-00001";
> commaSeperatedPaths_2 = "MAPPINGOUTPUT-r-00002";
>
> FileInputFormat.setInputPaths(job, commaSeperatedPaths_1);
> FileInputFormat.setInputPaths(job, commaSeperatedPaths_2);
>
>
>
> 2009/4/7 Brian MacKay <Br...@medecision.com>
>
> >
> > Not sure about your question:  seems like you'd like to do this...?
> >
> > After you run job, your output may be like MAPPINGOUTPUT-r-00001,
> > MAPPINGOUTPUT-r-00002, etc.
> >
> > You'd need to set them as multiple inputs.
> >
> > FileInputFormat.setInputPaths(job, commaSeperatedPaths);
> >
> >
> > Brian
> >
> > -----Original Message-----
> > From: 皮皮 [mailto:pi.bingfeng@gmail.com]
> > Sent: Tuesday, April 07, 2009 3:30 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Multiple k,v pairs from a single map - possible?
> >
> > could any body tell me how to get one of the multipleoutput file in
> another
> > jobconfig?
> >
> > 2009/4/3 皮皮 <pi...@gmail.com>
> >
> > > thank you very much . this is what i am looking for.
> > >
> > > 2009/3/27 Brian MacKay <Br...@medecision.com>
> > >
> > >
> > >> Amandeep,
> > >>
> > >> Add this to your driver.....
> > >>
> > >> MultipleOutputs.addNamedOutput(conf, "PHONE",TextOutputFormat.class,
> > >> Text.class, Text.class);
> > >>
> > >> MultipleOutputs.addNamedOutput(conf, "NAME,
> > >>                    TextOutputFormat.class, Text.class, Text.class);
> > >>
> > >>
> > >>
> > >> And in your reducer....
> > >>
> > >>  private MultipleOutputs mos;
> > >>
> > >> public void reduce(Text key, Iterator<Text> values,
> > >>            OutputCollector<Text, Text> output, Reporter reporter) {
> > >>
> > >>
> > >>          // namedOutPut = either PHONE or NAME
> > >>
> > >>        while (values.hasNext()) {
> > >>            String value = values.next().toString();
> > >>            mos.getCollector(namedOutPut, reporter).collect(
> > >>                    new Text(value), new Text(othervals));
> > >>        }
> > >>    }
> > >>
> > >>    @Override
> > >>    public void configure(JobConf conf) {
> > >>        super.configure(conf);
> > >>        mos = new MultipleOutputs(conf);
> > >>    }
> > >>
> > >>    public void close() throws IOException {
> > >>        mos.close();
> > >>    }
> > >>
> > >>
> > >>
> > >> By the way, have you had a change to post your Oracle fix to
> > >> DBInputFormat ?
> > >> If so, what is the Jira tag #?
> > >>
> > >> Brian
> > >>
> > >> -----Original Message-----
> > >> From: Amandeep Khurana [mailto:amansk@gmail.com]
> > >> Sent: Friday, March 27, 2009 5:46 AM
> > >> To: core-user@hadoop.apache.org
> > >> Subject: Multiple k,v pairs from a single map - possible?
> > >>
> > >> Is it possible to output multiple key value pairs from a single map
> > >> function
> > >> run?
> > >>
> > >> For example, the mapper outputing <name,phone> and <name, address>
> > >> simultaneously...
> > >>
> > >> Can I write multiple output.collect(...) commands?
> > >>
> > >> Amandeep
> > >>
> > >> Amandeep Khurana
> > >> Computer Science Graduate Student
> > >> University of California, Santa Cruz
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> > _
> > >> _
> > >>
> > >> The information transmitted is intended only for the person or entity
> to
> > >> which it is addressed and may contain confidential and/or privileged
> > >> material. Any review, retransmission, dissemination or other use of,
> or
> > >> taking of any action in reliance upon, this information by persons or
> > >> entities other than the intended recipient is prohibited. If you
> > received
> > >> this message in error, please contact the sender and delete the
> material
> > >> from any computer.
> > >>
> > >>
> > >>
> > >
> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> >
> > The information transmitted is intended only for the person or entity to
> > which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipient is prohibited. If you received
> > this message in error, please contact the sender and delete the material
> > from any computer.
> >
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals