You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Chris Collins <ch...@yahoo.com> on 2012/08/28 09:14:45 UTC

example usage of s3 file system

Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).

Can anyone point me to some good example usage of Hadoop FileSystem with s3?

I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....

Probably not the best list to look for help, any clues appreciated.

C

Re: Delays in worker node jobs

Posted by Harsh J <ha...@cloudera.com>.

Hey Terry,

Can you look at your JobTracker logs, grep it for this worker node's
hostname and see the task assignment timestamps vs. when the task
began in real (from the TaskTracker log, grepping for the same attempt
ID)?

On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <th...@bnl.gov> wrote:
> Running 1.0.2, in this case on Linux.
>
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
>
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
>
> What could be causing this?
>
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
>
> -Terry



-- 
Harsh J

Re: Delays in worker node jobs

Posted by Harsh J <ha...@cloudera.com>.

Hey Terry,

Can you look at your JobTracker logs, grep it for this worker node's
hostname and see the task assignment timestamps vs. when the task
began in real (from the TaskTracker log, grepping for the same attempt
ID)?

On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <th...@bnl.gov> wrote:
> Running 1.0.2, in this case on Linux.
>
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
>
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
>
> What could be causing this?
>
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
>
> -Terry



-- 
Harsh J

Re: Delays in worker node jobs

Posted by Harsh J <ha...@cloudera.com>.

Hey Terry,

Can you look at your JobTracker logs, grep it for this worker node's
hostname and see the task assignment timestamps vs. when the task
began in real (from the TaskTracker log, grepping for the same attempt
ID)?

On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <th...@bnl.gov> wrote:
> Running 1.0.2, in this case on Linux.
>
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
>
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
>
> What could be causing this?
>
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
>
> -Terry



-- 
Harsh J

Re: Delays in worker node jobs

Posted by Harsh J <ha...@cloudera.com>.

Hey Terry,

Can you look at your JobTracker logs, grep it for this worker node's
hostname and see the task assignment timestamps vs. when the task
began in real (from the TaskTracker log, grepping for the same attempt
ID)?

On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <th...@bnl.gov> wrote:
> Running 1.0.2, in this case on Linux.
>
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
>
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
>
> What could be causing this?
>
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
>
> -Terry



-- 
Harsh J

Re: Delays in worker node jobs

Posted by Steve Loughran <st...@hortonworks.com>.

if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.

The JT only hands out work when either of
 -the TT reports a task completion
 -the TT heartbeats in

This is a design that scales well for large clusters, but can add startup
latency for small ones

steve

On 30 August 2012 02:20, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
>
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
>
> -Terry
>
>
>
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> >
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> >
> > HTH,
> > +vinod
> >
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >
> >> Running 1.0.2, in this case on Linux.
> >>
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> running.
> >>
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >>
> >> What could be causing this?
> >>
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >>
> >> -Terry
>
> --
> Terry Healy / thealy@bnl.gov
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973
>
>
>
>

Re: Delays in worker node jobs

Posted by Steve Loughran <st...@hortonworks.com>.

if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.

The JT only hands out work when either of
 -the TT reports a task completion
 -the TT heartbeats in

This is a design that scales well for large clusters, but can add startup
latency for small ones

steve

On 30 August 2012 02:20, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
>
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
>
> -Terry
>
>
>
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> >
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> >
> > HTH,
> > +vinod
> >
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >
> >> Running 1.0.2, in this case on Linux.
> >>
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> running.
> >>
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >>
> >> What could be causing this?
> >>
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >>
> >> -Terry
>
> --
> Terry Healy / thealy@bnl.gov
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973
>
>
>
>

Re: Delays in worker node jobs

Posted by Steve Loughran <st...@hortonworks.com>.

if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.

The JT only hands out work when either of
 -the TT reports a task completion
 -the TT heartbeats in

This is a design that scales well for large clusters, but can add startup
latency for small ones

steve

On 30 August 2012 02:20, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
>
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
>
> -Terry
>
>
>
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> >
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> >
> > HTH,
> > +vinod
> >
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >
> >> Running 1.0.2, in this case on Linux.
> >>
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> running.
> >>
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >>
> >> What could be causing this?
> >>
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >>
> >> -Terry
>
> --
> Terry Healy / thealy@bnl.gov
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973
>
>
>
>

Re: Delays in worker node jobs

Posted by Steve Loughran <st...@hortonworks.com>.

if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.

The JT only hands out work when either of
 -the TT reports a task completion
 -the TT heartbeats in

This is a design that scales well for large clusters, but can add startup
latency for small ones

steve

On 30 August 2012 02:20, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
>
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
>
> -Terry
>
>
>
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> >
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> >
> > HTH,
> > +vinod
> >
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >
> >> Running 1.0.2, in this case on Linux.
> >>
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> running.
> >>
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >>
> >> What could be causing this?
> >>
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >>
> >> -Terry
>
> --
> Terry Healy / thealy@bnl.gov
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973
>
>
>
>

Re: Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.

Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.

Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.

Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.

Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: Delays in worker node jobs

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

> Running 1.0.2, in this case on Linux.
> 
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
> 
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
> 
> What could be causing this?
> 
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
> 
> -Terry

Re: Delays in worker node jobs

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

> Running 1.0.2, in this case on Linux.
> 
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
> 
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
> 
> What could be causing this?
> 
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
> 
> -Terry

Re: Delays in worker node jobs

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

> Running 1.0.2, in this case on Linux.
> 
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
> 
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
> 
> What could be causing this?
> 
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
> 
> -Terry

Re: Delays in worker node jobs

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

> Running 1.0.2, in this case on Linux.
> 
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
> 
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
> 
> What could be causing this?
> 
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
> 
> -Terry

Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.

Running 1.0.2, in this case on Linux.

I was watching the processes / loads on one TaskTracker instance and
noticed that it completed it's first 8 map tasks and reported 8 free
slots (the max for this system). It then waited doing nothing for more
than 30 seconds before the next "batch" of work came in and started running.

Likewise it also has relatively long periods with all 8 cores running at
or near idle. There are no jobs failing or obvious errors in the
TaskTracker log.

What could be causing this?

Should I increase the number of map jobs to greater than number of cores
to try and keep it busier?

-Terry

Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.

Running 1.0.2, in this case on Linux.

I was watching the processes / loads on one TaskTracker instance and
noticed that it completed it's first 8 map tasks and reported 8 free
slots (the max for this system). It then waited doing nothing for more
than 30 seconds before the next "batch" of work came in and started running.

Likewise it also has relatively long periods with all 8 cores running at
or near idle. There are no jobs failing or obvious errors in the
TaskTracker log.

What could be causing this?

Should I increase the number of map jobs to greater than number of cores
to try and keep it busier?

-Terry

Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.

Running 1.0.2, in this case on Linux.

I was watching the processes / loads on one TaskTracker instance and
noticed that it completed it's first 8 map tasks and reported 8 free
slots (the max for this system). It then waited doing nothing for more
than 30 seconds before the next "batch" of work came in and started running.

Likewise it also has relatively long periods with all 8 cores running at
or near idle. There are no jobs failing or obvious errors in the
TaskTracker log.

What could be causing this?

Should I increase the number of map jobs to greater than number of cores
to try and keep it busier?

-Terry

Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.

Running 1.0.2, in this case on Linux.

I was watching the processes / loads on one TaskTracker instance and
noticed that it completed it's first 8 map tasks and reported 8 free
slots (the max for this system). It then waited doing nothing for more
than 30 seconds before the next "batch" of work came in and started running.

Likewise it also has relatively long periods with all 8 cores running at
or near idle. There are no jobs failing or obvious errors in the
TaskTracker log.

What could be causing this?

Should I increase the number of map jobs to greater than number of cores
to try and keep it busier?

-Terry

Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.

Thanks, I should of been more clear.  I am not attempting to perform a map reduce job.  I was literally trying to use the FileSystem abstraction (rather than using jets3t library directly to access S3.  I was assuming it handled the mocking of directories in s3 (as it is not a native feature of that file system).

C
On Aug 28, 2012, at 10:16 AM, Manoj Babu <ma...@gmail.com> wrote:

> Hi,
> 
> Here is an example, might help you.
> 
> http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html 
> 
> Cheers!
> Manoj.
> 
> 
> 
> On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins <ch...@yahoo.com> wrote:
> 
> 
> 
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C
> 
>

Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.

Thanks, I should of been more clear.  I am not attempting to perform a map reduce job.  I was literally trying to use the FileSystem abstraction (rather than using jets3t library directly to access S3.  I was assuming it handled the mocking of directories in s3 (as it is not a native feature of that file system).

C
On Aug 28, 2012, at 10:16 AM, Manoj Babu <ma...@gmail.com> wrote:

> Hi,
> 
> Here is an example, might help you.
> 
> http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html 
> 
> Cheers!
> Manoj.
> 
> 
> 
> On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins <ch...@yahoo.com> wrote:
> 
> 
> 
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C
> 
>

Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.

Thanks, I should of been more clear.  I am not attempting to perform a map reduce job.  I was literally trying to use the FileSystem abstraction (rather than using jets3t library directly to access S3.  I was assuming it handled the mocking of directories in s3 (as it is not a native feature of that file system).

C
On Aug 28, 2012, at 10:16 AM, Manoj Babu <ma...@gmail.com> wrote:

> Hi,
> 
> Here is an example, might help you.
> 
> http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html 
> 
> Cheers!
> Manoj.
> 
> 
> 
> On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins <ch...@yahoo.com> wrote:
> 
> 
> 
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C
> 
>

Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.

Thanks, I should of been more clear.  I am not attempting to perform a map reduce job.  I was literally trying to use the FileSystem abstraction (rather than using jets3t library directly to access S3.  I was assuming it handled the mocking of directories in s3 (as it is not a native feature of that file system).

C
On Aug 28, 2012, at 10:16 AM, Manoj Babu <ma...@gmail.com> wrote:

> Hi,
> 
> Here is an example, might help you.
> 
> http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html 
> 
> Cheers!
> Manoj.
> 
> 
> 
> On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins <ch...@yahoo.com> wrote:
> 
> 
> 
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C
> 
>

Re: example usage of s3 file system

Posted by Manoj Babu <ma...@gmail.com>.

Hi,

Here is an example, might help you.

http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html


Cheers!
Manoj.



On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins
<ch...@yahoo.com>wrote:

>
>
>
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my
> tinkering I am not having a great deal of success.  I am particularly
> interested in the ability to mimic a directory structure (since s3 native
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with
> s3?
>
> I created a few directories using transit and AWS S3 console for test.
>  Doing a liststatus of the bucket returns a FileStatus object of the
> directory created but if I try to do a liststatus of that path I am getting
> a 404:
>
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host
> ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C
>
>

Re: example usage of s3 file system

Posted by Manoj Babu <ma...@gmail.com>.

Hi,

Here is an example, might help you.

http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html


Cheers!
Manoj.



On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins
<ch...@yahoo.com>wrote:

>
>
>
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my
> tinkering I am not having a great deal of success.  I am particularly
> interested in the ability to mimic a directory structure (since s3 native
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with
> s3?
>
> I created a few directories using transit and AWS S3 console for test.
>  Doing a liststatus of the bucket returns a FileStatus object of the
> directory created but if I try to do a liststatus of that path I am getting
> a 404:
>
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host
> ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C
>
>

Re: example usage of s3 file system

Posted by Manoj Babu <ma...@gmail.com>.

Hi,

Here is an example, might help you.

http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html


Cheers!
Manoj.



On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins
<ch...@yahoo.com>wrote:

>
>
>
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my
> tinkering I am not having a great deal of success.  I am particularly
> interested in the ability to mimic a directory structure (since s3 native
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with
> s3?
>
> I created a few directories using transit and AWS S3 console for test.
>  Doing a liststatus of the bucket returns a FileStatus object of the
> directory created but if I try to do a liststatus of that path I am getting
> a 404:
>
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host
> ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C
>
>

Re: example usage of s3 file system

Posted by Manoj Babu <ma...@gmail.com>.

Hi,

Here is an example, might help you.

http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html


Cheers!
Manoj.



On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins
<ch...@yahoo.com>wrote:

>
>
>
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my
> tinkering I am not having a great deal of success.  I am particularly
> interested in the ability to mimic a directory structure (since s3 native
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with
> s3?
>
> I created a few directories using transit and AWS S3 console for test.
>  Doing a liststatus of the bucket returns a FileStatus object of the
> directory created but if I try to do a liststatus of that path I am getting
> a 404:
>
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host
> ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C
>
>