You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Sai Sai <sa...@yahoo.in> on 2013/03/27 11:21:48 UTC

System.out.printlin vs Counters

Q1. Is it right to assume the  System.out.println statements are used only in eclipse environment and 

In a multi node cluster environment we need to use counters.

Q2. I am slightly confused as it appears like using  System.out.println statements
we r able to get detailed info at every line of code in eclipse and counters just give few lines and not as detailed as  System.out.println statements do so what should we do in a multi node cluster enivronment.

Q3. Also when they say the limit of counters is 120 does that mean that in the output if we use:
context.getCounters("TestGroup1","TestName1").increment(1);
more than 120 times it will not print it. or does it refer to 120 options of counters in an enum that we can define.

Any help is really appreciated.
Thanks
Sai

Re: System.out.printlin vs Counters

Posted by Paul Wilkinson <pa...@gmail.com>.

While using System.out inside a Mapper or Reducer is fine as an aid to
learning, be careful: accidentally leaving them in (or not moving to
something like log4J) and running the job for real can mean writing
millions of lines of log output on a tasktracker, filling up disks and
making jobs needlessly slow.

Paul


On 27 March 2013 10:38, zheyi rong <zh...@gmail.com> wrote:

> Hello,
>
> Q1.
> Depends on your need. If you would like an overall statistics, for
> example, the number of the malformed records in your datasets,
> use counters. If you just want to know what is going on inside a mapper or
> reducer, use System.out.println;
> since mappers do not know each other, you cannot get an overall statistics
> of your job by using System.out.println().
> The output of  System.out.println() will finally appear in the tasklog.
>
> Q2.
> In a distributed environment, mappers do not know each other. Imagine that
> mapper A is running on a machine, and mapper B is running on another
> machine, so in mapper A, you cannot get the internal state of mapper B
> simply by System.out.println().
>
> Q3.
> Harsh J answered it.
>
> Zheyi.
>
> 2013/3/27 Sai Sai <sa...@yahoo.in>
>
>> Q1. Is it right to assume the System.out.println statements are used only
>> in eclipse environment and
>> In a multi node cluster environment we need to use counters.
>>
>> Q2. I am slightly confused as it appears like using System.out.println
>> statements
>> we r able to get detailed info at every line of code in eclipse and
>> counters just give few lines and not as detailed as System.out.println
>> statements do so what should we do in a multi node cluster enivronment.
>>
>> Q3. Also when they say the limit of counters is 120 does that mean that
>> in the output if we use:
>> context.getCounters("TestGroup1","TestName1").increment(1);
>> more than 120 times it will not print it. or does it refer to 120 options
>> of counters in an enum that we can define.
>>
>> Any help is really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>

Re: System.out.printlin vs Counters

Posted by Paul Wilkinson <pa...@gmail.com>.

While using System.out inside a Mapper or Reducer is fine as an aid to
learning, be careful: accidentally leaving them in (or not moving to
something like log4J) and running the job for real can mean writing
millions of lines of log output on a tasktracker, filling up disks and
making jobs needlessly slow.

Paul


On 27 March 2013 10:38, zheyi rong <zh...@gmail.com> wrote:

> Hello,
>
> Q1.
> Depends on your need. If you would like an overall statistics, for
> example, the number of the malformed records in your datasets,
> use counters. If you just want to know what is going on inside a mapper or
> reducer, use System.out.println;
> since mappers do not know each other, you cannot get an overall statistics
> of your job by using System.out.println().
> The output of  System.out.println() will finally appear in the tasklog.
>
> Q2.
> In a distributed environment, mappers do not know each other. Imagine that
> mapper A is running on a machine, and mapper B is running on another
> machine, so in mapper A, you cannot get the internal state of mapper B
> simply by System.out.println().
>
> Q3.
> Harsh J answered it.
>
> Zheyi.
>
> 2013/3/27 Sai Sai <sa...@yahoo.in>
>
>> Q1. Is it right to assume the System.out.println statements are used only
>> in eclipse environment and
>> In a multi node cluster environment we need to use counters.
>>
>> Q2. I am slightly confused as it appears like using System.out.println
>> statements
>> we r able to get detailed info at every line of code in eclipse and
>> counters just give few lines and not as detailed as System.out.println
>> statements do so what should we do in a multi node cluster enivronment.
>>
>> Q3. Also when they say the limit of counters is 120 does that mean that
>> in the output if we use:
>> context.getCounters("TestGroup1","TestName1").increment(1);
>> more than 120 times it will not print it. or does it refer to 120 options
>> of counters in an enum that we can define.
>>
>> Any help is really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>

Re: System.out.printlin vs Counters

Posted by Paul Wilkinson <pa...@gmail.com>.

While using System.out inside a Mapper or Reducer is fine as an aid to
learning, be careful: accidentally leaving them in (or not moving to
something like log4J) and running the job for real can mean writing
millions of lines of log output on a tasktracker, filling up disks and
making jobs needlessly slow.

Paul


On 27 March 2013 10:38, zheyi rong <zh...@gmail.com> wrote:

> Hello,
>
> Q1.
> Depends on your need. If you would like an overall statistics, for
> example, the number of the malformed records in your datasets,
> use counters. If you just want to know what is going on inside a mapper or
> reducer, use System.out.println;
> since mappers do not know each other, you cannot get an overall statistics
> of your job by using System.out.println().
> The output of  System.out.println() will finally appear in the tasklog.
>
> Q2.
> In a distributed environment, mappers do not know each other. Imagine that
> mapper A is running on a machine, and mapper B is running on another
> machine, so in mapper A, you cannot get the internal state of mapper B
> simply by System.out.println().
>
> Q3.
> Harsh J answered it.
>
> Zheyi.
>
> 2013/3/27 Sai Sai <sa...@yahoo.in>
>
>> Q1. Is it right to assume the System.out.println statements are used only
>> in eclipse environment and
>> In a multi node cluster environment we need to use counters.
>>
>> Q2. I am slightly confused as it appears like using System.out.println
>> statements
>> we r able to get detailed info at every line of code in eclipse and
>> counters just give few lines and not as detailed as System.out.println
>> statements do so what should we do in a multi node cluster enivronment.
>>
>> Q3. Also when they say the limit of counters is 120 does that mean that
>> in the output if we use:
>> context.getCounters("TestGroup1","TestName1").increment(1);
>> more than 120 times it will not print it. or does it refer to 120 options
>> of counters in an enum that we can define.
>>
>> Any help is really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>

Re: System.out.printlin vs Counters

Posted by Paul Wilkinson <pa...@gmail.com>.

While using System.out inside a Mapper or Reducer is fine as an aid to
learning, be careful: accidentally leaving them in (or not moving to
something like log4J) and running the job for real can mean writing
millions of lines of log output on a tasktracker, filling up disks and
making jobs needlessly slow.

Paul


On 27 March 2013 10:38, zheyi rong <zh...@gmail.com> wrote:

> Hello,
>
> Q1.
> Depends on your need. If you would like an overall statistics, for
> example, the number of the malformed records in your datasets,
> use counters. If you just want to know what is going on inside a mapper or
> reducer, use System.out.println;
> since mappers do not know each other, you cannot get an overall statistics
> of your job by using System.out.println().
> The output of  System.out.println() will finally appear in the tasklog.
>
> Q2.
> In a distributed environment, mappers do not know each other. Imagine that
> mapper A is running on a machine, and mapper B is running on another
> machine, so in mapper A, you cannot get the internal state of mapper B
> simply by System.out.println().
>
> Q3.
> Harsh J answered it.
>
> Zheyi.
>
> 2013/3/27 Sai Sai <sa...@yahoo.in>
>
>> Q1. Is it right to assume the System.out.println statements are used only
>> in eclipse environment and
>> In a multi node cluster environment we need to use counters.
>>
>> Q2. I am slightly confused as it appears like using System.out.println
>> statements
>> we r able to get detailed info at every line of code in eclipse and
>> counters just give few lines and not as detailed as System.out.println
>> statements do so what should we do in a multi node cluster enivronment.
>>
>> Q3. Also when they say the limit of counters is 120 does that mean that
>> in the output if we use:
>> context.getCounters("TestGroup1","TestName1").increment(1);
>> more than 120 times it will not print it. or does it refer to 120 options
>> of counters in an enum that we can define.
>>
>> Any help is really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>

Re: System.out.printlin vs Counters

Posted by zheyi rong <zh...@gmail.com>.

Hello,

Q1.
Depends on your need. If you would like an overall statistics, for example,
the number of the malformed records in your datasets,
use counters. If you just want to know what is going on inside a mapper or
reducer, use System.out.println;
since mappers do not know each other, you cannot get an overall statistics
of your job by using System.out.println().
The output of  System.out.println() will finally appear in the tasklog.

Q2.
In a distributed environment, mappers do not know each other. Imagine that
mapper A is running on a machine, and mapper B is running on another
machine, so in mapper A, you cannot get the internal state of mapper B
simply by System.out.println().

Q3.
Harsh J answered it.

Zheyi.

2013/3/27 Sai Sai <sa...@yahoo.in>

> Q1. Is it right to assume the System.out.println statements are used only
> in eclipse environment and
> In a multi node cluster environment we need to use counters.
>
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and
> counters just give few lines and not as detailed as System.out.println
> statements do so what should we do in a multi node cluster enivronment.
>
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options
> of counters in an enum that we can define.
>
> Any help is really appreciated.
> Thanks
> Sai
>
>
>

Re: System.out.printlin vs Counters

Posted by Harsh J <ha...@cloudera.com>.

I do not understand 1 and 2: Counters are used to count things in the MR
framework in a distributed manner and get aggregate at the JobTracker
level; System.out is merely used to write to STDOUT. Why are you
comparing the two?

3: The limit means the total number of counter names accepted from a
single job. Your example would create one such new counter called
TestGroup1.TestName1. You could also increment TestName2, etc.… but
only up to a global max of 120 of such new counters.

On Wed, Mar 27, 2013 at 3:51 PM, Sai Sai <sa...@yahoo.in> wrote:
> Q1. Is it right to assume the System.out.println statements are used only in
> eclipse environment and
> In a multi node cluster environment we need to use counters.
>
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and counters
> just give few lines and not as detailed as System.out.println statements do
> so what should we do in a multi node cluster enivronment.
>
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options of
> counters in an enum that we can define.
>
> Any help is really appreciated.
> Thanks
> Sai
>
>

--
Harsh J

Re: System.out.printlin vs Counters

Posted by zheyi rong <zh...@gmail.com>.

Hello,

Q1.
Depends on your need. If you would like an overall statistics, for example,
the number of the malformed records in your datasets,
use counters. If you just want to know what is going on inside a mapper or
reducer, use System.out.println;
since mappers do not know each other, you cannot get an overall statistics
of your job by using System.out.println().
The output of  System.out.println() will finally appear in the tasklog.

Q2.
In a distributed environment, mappers do not know each other. Imagine that
mapper A is running on a machine, and mapper B is running on another
machine, so in mapper A, you cannot get the internal state of mapper B
simply by System.out.println().

Q3.
Harsh J answered it.

Zheyi.

2013/3/27 Sai Sai <sa...@yahoo.in>

> Q1. Is it right to assume the System.out.println statements are used only
> in eclipse environment and
> In a multi node cluster environment we need to use counters.
>
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and
> counters just give few lines and not as detailed as System.out.println
> statements do so what should we do in a multi node cluster enivronment.
>
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options
> of counters in an enum that we can define.
>
> Any help is really appreciated.
> Thanks
> Sai
>
>
>

Re: System.out.printlin vs Counters

Posted by Harsh J <ha...@cloudera.com>.

I do not understand 1 and 2: Counters are used to count things in the MR
framework in a distributed manner and get aggregate at the JobTracker
level; System.out is merely used to write to STDOUT. Why are you
comparing the two?

3: The limit means the total number of counter names accepted from a
single job. Your example would create one such new counter called
TestGroup1.TestName1. You could also increment TestName2, etc.… but
only up to a global max of 120 of such new counters.

On Wed, Mar 27, 2013 at 3:51 PM, Sai Sai <sa...@yahoo.in> wrote:
> Q1. Is it right to assume the System.out.println statements are used only in
> eclipse environment and
> In a multi node cluster environment we need to use counters.
>
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and counters
> just give few lines and not as detailed as System.out.println statements do
> so what should we do in a multi node cluster enivronment.
>
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options of
> counters in an enum that we can define.
>
> Any help is really appreciated.
> Thanks
> Sai
>
>

--
Harsh J

Re: System.out.printlin vs Counters

Posted by Harsh J <ha...@cloudera.com>.

I do not understand 1 and 2: Counters are used to count things in the MR
framework in a distributed manner and get aggregate at the JobTracker
level; System.out is merely used to write to STDOUT. Why are you
comparing the two?

3: The limit means the total number of counter names accepted from a
single job. Your example would create one such new counter called
TestGroup1.TestName1. You could also increment TestName2, etc.… but
only up to a global max of 120 of such new counters.

On Wed, Mar 27, 2013 at 3:51 PM, Sai Sai <sa...@yahoo.in> wrote:
> Q1. Is it right to assume the System.out.println statements are used only in
> eclipse environment and
> In a multi node cluster environment we need to use counters.
>
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and counters
> just give few lines and not as detailed as System.out.println statements do
> so what should we do in a multi node cluster enivronment.
>
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options of
> counters in an enum that we can define.
>
> Any help is really appreciated.
> Thanks
> Sai
>
>

--
Harsh J

Re: System.out.printlin vs Counters

Posted by zheyi rong <zh...@gmail.com>.

Hello,

Q1.
Depends on your need. If you would like an overall statistics, for example,
the number of the malformed records in your datasets,
use counters. If you just want to know what is going on inside a mapper or
reducer, use System.out.println;
since mappers do not know each other, you cannot get an overall statistics
of your job by using System.out.println().
The output of  System.out.println() will finally appear in the tasklog.

Q2.
In a distributed environment, mappers do not know each other. Imagine that
mapper A is running on a machine, and mapper B is running on another
machine, so in mapper A, you cannot get the internal state of mapper B
simply by System.out.println().

Q3.
Harsh J answered it.

Zheyi.

2013/3/27 Sai Sai <sa...@yahoo.in>

> Q1. Is it right to assume the System.out.println statements are used only
> in eclipse environment and
> In a multi node cluster environment we need to use counters.
>
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and
> counters just give few lines and not as detailed as System.out.println
> statements do so what should we do in a multi node cluster enivronment.
>
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options
> of counters in an enum that we can define.
>
> Any help is really appreciated.
> Thanks
> Sai
>
>
>

Re: System.out.printlin vs Counters

Posted by zheyi rong <zh...@gmail.com>.

Hello,

Q1.
Depends on your need. If you would like an overall statistics, for example,
the number of the malformed records in your datasets,
use counters. If you just want to know what is going on inside a mapper or
reducer, use System.out.println;
since mappers do not know each other, you cannot get an overall statistics
of your job by using System.out.println().
The output of  System.out.println() will finally appear in the tasklog.

Q2.
In a distributed environment, mappers do not know each other. Imagine that
mapper A is running on a machine, and mapper B is running on another
machine, so in mapper A, you cannot get the internal state of mapper B
simply by System.out.println().

Q3.
Harsh J answered it.

Zheyi.

2013/3/27 Sai Sai <sa...@yahoo.in>

> Q1. Is it right to assume the System.out.println statements are used only
> in eclipse environment and
> In a multi node cluster environment we need to use counters.
>
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and
> counters just give few lines and not as detailed as System.out.println
> statements do so what should we do in a multi node cluster enivronment.
>
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options
> of counters in an enum that we can define.
>
> Any help is really appreciated.
> Thanks
> Sai
>
>
>

Re: System.out.printlin vs Counters

Posted by Harsh J <ha...@cloudera.com>.

I do not understand 1 and 2: Counters are used to count things in the MR
framework in a distributed manner and get aggregate at the JobTracker
level; System.out is merely used to write to STDOUT. Why are you
comparing the two?

3: The limit means the total number of counter names accepted from a
single job. Your example would create one such new counter called
TestGroup1.TestName1. You could also increment TestName2, etc.… but
only up to a global max of 120 of such new counters.

On Wed, Mar 27, 2013 at 3:51 PM, Sai Sai <sa...@yahoo.in> wrote:
> Q1. Is it right to assume the System.out.println statements are used only in
> eclipse environment and
> In a multi node cluster environment we need to use counters.
>
> Q2. I am slightly confused as it appears like using System.out.println
> statements
> we r able to get detailed info at every line of code in eclipse and counters
> just give few lines and not as detailed as System.out.println statements do
> so what should we do in a multi node cluster enivronment.
>
> Q3. Also when they say the limit of counters is 120 does that mean that in
> the output if we use:
> context.getCounters("TestGroup1","TestName1").increment(1);
> more than 120 times it will not print it. or does it refer to 120 options of
> counters in an enum that we can define.
>
> Any help is really appreciated.
> Thanks
> Sai
>
>

--
Harsh J