You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by jeremy p <at...@gmail.com> on 2014/05/06 00:49:12 UTC

Are mapper classes re-instantiated for each record?

Let's say I have TaskTracker that receives 5 records to process for a
single job.  When the TaskTracker processses the first record, it will
instantiate my Mapper class and execute my setup() function.  It will then
run the map() method on that record.  My question is this : what happens
when the map() method has finished processing the first record?  I'm
guessing it will do one of two things :

1) My cleanup() function will execute.  After the cleanup() method has
finished, this instance of the Mapper object will be destroyed.  When it is
time to process the next record, a new Mapper object will be instantiated.
 Then my setup() method will execute, the map() method will execute, the
cleanup() method will execute, and then the Mapper instance will be
destroyed.  When it is time to process the next record, a new Mapper object
will be instantiated.  This process will repeat itself until all 5 records
have been processed.  In other words, my setup() and cleanup() methods will
have been executed 5 times each.

or

2) When the map() method has finished processing my first record, the
Mapper instance will NOT be destroyed.  It will be reused for all 5
records.  When the map() method has finished processing the last record, my
cleanup() method will execute.  In other words, my setup() and cleanup()
methods will only execute 1 time each.

Thanks for the help!

Re: Are mapper classes re-instantiated for each record?

Posted by jeremy p <at...@gmail.com>.
Thank you!  This has helped me immensely.


On Tue, May 6, 2014 at 12:47 AM, Raj K Singh <ra...@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>

Re: Are mapper classes re-instantiated for each record?

Posted by unmesha sreeveni <un...@gmail.com>.
Setup() Method is called before all the mappers and cleanup() method is
called after all mappers


On Tue, May 6, 2014 at 1:17 PM, Raj K Singh <ra...@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Are mapper classes re-instantiated for each record?

Posted by unmesha sreeveni <un...@gmail.com>.
Setup() Method is called before all the mappers and cleanup() method is
called after all mappers


On Tue, May 6, 2014 at 1:17 PM, Raj K Singh <ra...@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Are mapper classes re-instantiated for each record?

Posted by jeremy p <at...@gmail.com>.
Thank you!  This has helped me immensely.


On Tue, May 6, 2014 at 12:47 AM, Raj K Singh <ra...@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>

Re: Are mapper classes re-instantiated for each record?

Posted by unmesha sreeveni <un...@gmail.com>.
Setup() Method is called before all the mappers and cleanup() method is
called after all mappers


On Tue, May 6, 2014 at 1:17 PM, Raj K Singh <ra...@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Are mapper classes re-instantiated for each record?

Posted by jeremy p <at...@gmail.com>.
Thank you!  This has helped me immensely.


On Tue, May 6, 2014 at 12:47 AM, Raj K Singh <ra...@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>

Re: Are mapper classes re-instantiated for each record?

Posted by jeremy p <at...@gmail.com>.
Thank you!  This has helped me immensely.


On Tue, May 6, 2014 at 12:47 AM, Raj K Singh <ra...@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>

Re: Are mapper classes re-instantiated for each record?

Posted by unmesha sreeveni <un...@gmail.com>.
Setup() Method is called before all the mappers and cleanup() method is
called after all mappers


On Tue, May 6, 2014 at 1:17 PM, Raj K Singh <ra...@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Are mapper classes re-instantiated for each record?

Posted by Raj K Singh <ra...@gmail.com>.
point 2 is right,The framework first calls setup() followed by map() for
each key/value pair in the InputSplit. Finally cleanup() is called
irrespective of no of records in the input split.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:

>  Hi Jeremy,
>
> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
> variant 2 is more correct. But actually single mapper can be used for
> processing multiple InputSplits. In you case if you have 5 files with 1
> record each it can call setup/cleanup 5 times. But if your records are in
> single file I think that setup/cleanup should be called once.
>
> --
> Thanks,
> Sergey
>
>
> On 06/05/14 02:49, jeremy p wrote:
>
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will then
> run the map() method on that record.  My question is this : what happens
> when the map() method has finished processing the first record?  I'm
> guessing it will do one of two things :
>
>  1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When it is
> time to process the next record, a new Mapper object will be instantiated.
>  Then my setup() method will execute, the map() method will execute, the
> cleanup() method will execute, and then the Mapper instance will be
> destroyed.  When it is time to process the next record, a new Mapper object
> will be instantiated.  This process will repeat itself until all 5 records
> have been processed.  In other words, my setup() and cleanup() methods will
> have been executed 5 times each.
>
>  or
>
>  2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last record, my
> cleanup() method will execute.  In other words, my setup() and cleanup()
> methods will only execute 1 time each.
>
>  Thanks for the help!
>
>
>

Re: Are mapper classes re-instantiated for each record?

Posted by Raj K Singh <ra...@gmail.com>.
point 2 is right,The framework first calls setup() followed by map() for
each key/value pair in the InputSplit. Finally cleanup() is called
irrespective of no of records in the input split.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:

>  Hi Jeremy,
>
> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
> variant 2 is more correct. But actually single mapper can be used for
> processing multiple InputSplits. In you case if you have 5 files with 1
> record each it can call setup/cleanup 5 times. But if your records are in
> single file I think that setup/cleanup should be called once.
>
> --
> Thanks,
> Sergey
>
>
> On 06/05/14 02:49, jeremy p wrote:
>
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will then
> run the map() method on that record.  My question is this : what happens
> when the map() method has finished processing the first record?  I'm
> guessing it will do one of two things :
>
>  1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When it is
> time to process the next record, a new Mapper object will be instantiated.
>  Then my setup() method will execute, the map() method will execute, the
> cleanup() method will execute, and then the Mapper instance will be
> destroyed.  When it is time to process the next record, a new Mapper object
> will be instantiated.  This process will repeat itself until all 5 records
> have been processed.  In other words, my setup() and cleanup() methods will
> have been executed 5 times each.
>
>  or
>
>  2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last record, my
> cleanup() method will execute.  In other words, my setup() and cleanup()
> methods will only execute 1 time each.
>
>  Thanks for the help!
>
>
>

Re: Are mapper classes re-instantiated for each record?

Posted by Raj K Singh <ra...@gmail.com>.
point 2 is right,The framework first calls setup() followed by map() for
each key/value pair in the InputSplit. Finally cleanup() is called
irrespective of no of records in the input split.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:

>  Hi Jeremy,
>
> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
> variant 2 is more correct. But actually single mapper can be used for
> processing multiple InputSplits. In you case if you have 5 files with 1
> record each it can call setup/cleanup 5 times. But if your records are in
> single file I think that setup/cleanup should be called once.
>
> --
> Thanks,
> Sergey
>
>
> On 06/05/14 02:49, jeremy p wrote:
>
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will then
> run the map() method on that record.  My question is this : what happens
> when the map() method has finished processing the first record?  I'm
> guessing it will do one of two things :
>
>  1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When it is
> time to process the next record, a new Mapper object will be instantiated.
>  Then my setup() method will execute, the map() method will execute, the
> cleanup() method will execute, and then the Mapper instance will be
> destroyed.  When it is time to process the next record, a new Mapper object
> will be instantiated.  This process will repeat itself until all 5 records
> have been processed.  In other words, my setup() and cleanup() methods will
> have been executed 5 times each.
>
>  or
>
>  2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last record, my
> cleanup() method will execute.  In other words, my setup() and cleanup()
> methods will only execute 1 time each.
>
>  Thanks for the help!
>
>
>

Re: Are mapper classes re-instantiated for each record?

Posted by Raj K Singh <ra...@gmail.com>.
point 2 is right,The framework first calls setup() followed by map() for
each key/value pair in the InputSplit. Finally cleanup() is called
irrespective of no of records in the input split.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <se...@gmail.com>wrote:

>  Hi Jeremy,
>
> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
> variant 2 is more correct. But actually single mapper can be used for
> processing multiple InputSplits. In you case if you have 5 files with 1
> record each it can call setup/cleanup 5 times. But if your records are in
> single file I think that setup/cleanup should be called once.
>
> --
> Thanks,
> Sergey
>
>
> On 06/05/14 02:49, jeremy p wrote:
>
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will then
> run the map() method on that record.  My question is this : what happens
> when the map() method has finished processing the first record?  I'm
> guessing it will do one of two things :
>
>  1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When it is
> time to process the next record, a new Mapper object will be instantiated.
>  Then my setup() method will execute, the map() method will execute, the
> cleanup() method will execute, and then the Mapper instance will be
> destroyed.  When it is time to process the next record, a new Mapper object
> will be instantiated.  This process will repeat itself until all 5 records
> have been processed.  In other words, my setup() and cleanup() methods will
> have been executed 5 times each.
>
>  or
>
>  2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last record, my
> cleanup() method will execute.  In other words, my setup() and cleanup()
> methods will only execute 1 time each.
>
>  Thanks for the help!
>
>
>

Re: Are mapper classes re-instantiated for each record?

Posted by Sergey Murylev <se...@gmail.com>.
Hi Jeremy,

According to official documentation
<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>
setup and cleanup calls performed for each InputSplit. In this case you
variant 2 is more correct. But actually single mapper can be used for
processing multiple InputSplits. In you case if you have 5 files with 1
record each it can call setup/cleanup 5 times. But if your records are
in single file I think that setup/cleanup should be called once.

--
Thanks,
Sergey

On 06/05/14 02:49, jeremy p wrote:
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will
> then run the map() method on that record.  My question is this : what
> happens when the map() method has finished processing the first
> record?  I'm guessing it will do one of two things :
>
> 1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When
> it is time to process the next record, a new Mapper object will be
> instantiated.  Then my setup() method will execute, the map() method
> will execute, the cleanup() method will execute, and then the Mapper
> instance will be destroyed.  When it is time to process the next
> record, a new Mapper object will be instantiated.  This process will
> repeat itself until all 5 records have been processed.  In other
> words, my setup() and cleanup() methods will have been executed 5
> times each.
>
> or
>
> 2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last
> record, my cleanup() method will execute.  In other words, my setup()
> and cleanup() methods will only execute 1 time each.
>
> Thanks for the help!


Re: Are mapper classes re-instantiated for each record?

Posted by Sergey Murylev <se...@gmail.com>.
Hi Jeremy,

According to official documentation
<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>
setup and cleanup calls performed for each InputSplit. In this case you
variant 2 is more correct. But actually single mapper can be used for
processing multiple InputSplits. In you case if you have 5 files with 1
record each it can call setup/cleanup 5 times. But if your records are
in single file I think that setup/cleanup should be called once.

--
Thanks,
Sergey

On 06/05/14 02:49, jeremy p wrote:
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will
> then run the map() method on that record.  My question is this : what
> happens when the map() method has finished processing the first
> record?  I'm guessing it will do one of two things :
>
> 1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When
> it is time to process the next record, a new Mapper object will be
> instantiated.  Then my setup() method will execute, the map() method
> will execute, the cleanup() method will execute, and then the Mapper
> instance will be destroyed.  When it is time to process the next
> record, a new Mapper object will be instantiated.  This process will
> repeat itself until all 5 records have been processed.  In other
> words, my setup() and cleanup() methods will have been executed 5
> times each.
>
> or
>
> 2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last
> record, my cleanup() method will execute.  In other words, my setup()
> and cleanup() methods will only execute 1 time each.
>
> Thanks for the help!


Re: Are mapper classes re-instantiated for each record?

Posted by Sergey Murylev <se...@gmail.com>.
Hi Jeremy,

According to official documentation
<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>
setup and cleanup calls performed for each InputSplit. In this case you
variant 2 is more correct. But actually single mapper can be used for
processing multiple InputSplits. In you case if you have 5 files with 1
record each it can call setup/cleanup 5 times. But if your records are
in single file I think that setup/cleanup should be called once.

--
Thanks,
Sergey

On 06/05/14 02:49, jeremy p wrote:
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will
> then run the map() method on that record.  My question is this : what
> happens when the map() method has finished processing the first
> record?  I'm guessing it will do one of two things :
>
> 1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When
> it is time to process the next record, a new Mapper object will be
> instantiated.  Then my setup() method will execute, the map() method
> will execute, the cleanup() method will execute, and then the Mapper
> instance will be destroyed.  When it is time to process the next
> record, a new Mapper object will be instantiated.  This process will
> repeat itself until all 5 records have been processed.  In other
> words, my setup() and cleanup() methods will have been executed 5
> times each.
>
> or
>
> 2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last
> record, my cleanup() method will execute.  In other words, my setup()
> and cleanup() methods will only execute 1 time each.
>
> Thanks for the help!


Re: Are mapper classes re-instantiated for each record?

Posted by Sergey Murylev <se...@gmail.com>.
Hi Jeremy,

According to official documentation
<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>
setup and cleanup calls performed for each InputSplit. In this case you
variant 2 is more correct. But actually single mapper can be used for
processing multiple InputSplits. In you case if you have 5 files with 1
record each it can call setup/cleanup 5 times. But if your records are
in single file I think that setup/cleanup should be called once.

--
Thanks,
Sergey

On 06/05/14 02:49, jeremy p wrote:
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will
> then run the map() method on that record.  My question is this : what
> happens when the map() method has finished processing the first
> record?  I'm guessing it will do one of two things :
>
> 1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When
> it is time to process the next record, a new Mapper object will be
> instantiated.  Then my setup() method will execute, the map() method
> will execute, the cleanup() method will execute, and then the Mapper
> instance will be destroyed.  When it is time to process the next
> record, a new Mapper object will be instantiated.  This process will
> repeat itself until all 5 records have been processed.  In other
> words, my setup() and cleanup() methods will have been executed 5
> times each.
>
> or
>
> 2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last
> record, my cleanup() method will execute.  In other words, my setup()
> and cleanup() methods will only execute 1 time each.
>
> Thanks for the help!