You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Chris Collins <ch...@yahoo.com> on 2012/08/28 09:14:45 UTC

example usage of s3 file system

Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).

Can anyone point me to some good example usage of Hadoop FileSystem with s3?

I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....

Probably not the best list to look for help, any clues appreciated.

C

Re: Delays in worker node jobs

Posted by Harsh J <ha...@cloudera.com>.
Hey Terry,

Can you look at your JobTracker logs, grep it for this worker node's
hostname and see the task assignment timestamps vs. when the task
began in real (from the TaskTracker log, grepping for the same attempt
ID)?

On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <th...@bnl.gov> wrote:
> Running 1.0.2, in this case on Linux.
>
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
>
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
>
> What could be causing this?
>
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
>
> -Terry



-- 
Harsh J

Re: Delays in worker node jobs

Posted by Harsh J <ha...@cloudera.com>.
Hey Terry,

Can you look at your JobTracker logs, grep it for this worker node's
hostname and see the task assignment timestamps vs. when the task
began in real (from the TaskTracker log, grepping for the same attempt
ID)?

On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <th...@bnl.gov> wrote:
> Running 1.0.2, in this case on Linux.
>
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
>
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
>
> What could be causing this?
>
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
>
> -Terry



-- 
Harsh J

Re: Delays in worker node jobs

Posted by Harsh J <ha...@cloudera.com>.
Hey Terry,

Can you look at your JobTracker logs, grep it for this worker node's
hostname and see the task assignment timestamps vs. when the task
began in real (from the TaskTracker log, grepping for the same attempt
ID)?

On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <th...@bnl.gov> wrote:
> Running 1.0.2, in this case on Linux.
>
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
>
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
>
> What could be causing this?
>
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
>
> -Terry



-- 
Harsh J

Re: Delays in worker node jobs

Posted by Harsh J <ha...@cloudera.com>.
Hey Terry,

Can you look at your JobTracker logs, grep it for this worker node's
hostname and see the task assignment timestamps vs. when the task
began in real (from the TaskTracker log, grepping for the same attempt
ID)?

On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <th...@bnl.gov> wrote:
> Running 1.0.2, in this case on Linux.
>
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
>
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
>
> What could be causing this?
>
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
>
> -Terry



-- 
Harsh J

Re: Delays in worker node jobs

Posted by Steve Loughran <st...@hortonworks.com>.
if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.

The JT only hands out work when either of
 -the TT reports a task completion
 -the TT heartbeats in

This is a design that scales well for large clusters, but can add startup
latency for small ones

steve

On 30 August 2012 02:20, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
>
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
>
> -Terry
>
>
>
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> >
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> >
> > HTH,
> > +vinod
> >
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >
> >> Running 1.0.2, in this case on Linux.
> >>
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> running.
> >>
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >>
> >> What could be causing this?
> >>
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >>
> >> -Terry
>
> --
> Terry Healy / thealy@bnl.gov
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973
>
>
>
>

Re: Delays in worker node jobs

Posted by Steve Loughran <st...@hortonworks.com>.
if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.

The JT only hands out work when either of
 -the TT reports a task completion
 -the TT heartbeats in

This is a design that scales well for large clusters, but can add startup
latency for small ones

steve

On 30 August 2012 02:20, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
>
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
>
> -Terry
>
>
>
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> >
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> >
> > HTH,
> > +vinod
> >
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >
> >> Running 1.0.2, in this case on Linux.
> >>
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> running.
> >>
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >>
> >> What could be causing this?
> >>
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >>
> >> -Terry
>
> --
> Terry Healy / thealy@bnl.gov
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973
>
>
>
>

Re: Delays in worker node jobs

Posted by Steve Loughran <st...@hortonworks.com>.
if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.

The JT only hands out work when either of
 -the TT reports a task completion
 -the TT heartbeats in

This is a design that scales well for large clusters, but can add startup
latency for small ones

steve

On 30 August 2012 02:20, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
>
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
>
> -Terry
>
>
>
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> >
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> >
> > HTH,
> > +vinod
> >
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >
> >> Running 1.0.2, in this case on Linux.
> >>
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> running.
> >>
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >>
> >> What could be causing this?
> >>
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >>
> >> -Terry
>
> --
> Terry Healy / thealy@bnl.gov
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973
>
>
>
>

Re: Delays in worker node jobs

Posted by Steve Loughran <st...@hortonworks.com>.
if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.

The JT only hands out work when either of
 -the TT reports a task completion
 -the TT heartbeats in

This is a design that scales well for large clusters, but can add startup
latency for small ones

steve

On 30 August 2012 02:20, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
>
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
>
> -Terry
>
>
>
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> >
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> >
> > HTH,
> > +vinod
> >
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >
> >> Running 1.0.2, in this case on Linux.
> >>
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> running.
> >>
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >>
> >> What could be causing this?
> >>
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >>
> >> -Terry
>
> --
> Terry Healy / thealy@bnl.gov
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973
>
>
>
>

Re: Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.
Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973




Re: Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.
Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973




Re: Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.
Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973




Re: Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.
Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973




Re: Delays in worker node jobs

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

> Running 1.0.2, in this case on Linux.
> 
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
> 
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
> 
> What could be causing this?
> 
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
> 
> -Terry


Re: Delays in worker node jobs

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

> Running 1.0.2, in this case on Linux.
> 
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
> 
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
> 
> What could be causing this?
> 
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
> 
> -Terry


Re: Delays in worker node jobs

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

> Running 1.0.2, in this case on Linux.
> 
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
> 
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
> 
> What could be causing this?
> 
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
> 
> -Terry


Re: Delays in worker node jobs

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

> Running 1.0.2, in this case on Linux.
> 
> I was watching the processes / loads on one TaskTracker instance and
> noticed that it completed it's first 8 map tasks and reported 8 free
> slots (the max for this system). It then waited doing nothing for more
> than 30 seconds before the next "batch" of work came in and started running.
> 
> Likewise it also has relatively long periods with all 8 cores running at
> or near idle. There are no jobs failing or obvious errors in the
> TaskTracker log.
> 
> What could be causing this?
> 
> Should I increase the number of map jobs to greater than number of cores
> to try and keep it busier?
> 
> -Terry


Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.
Running 1.0.2, in this case on Linux.

I was watching the processes / loads on one TaskTracker instance and
noticed that it completed it's first 8 map tasks and reported 8 free
slots (the max for this system). It then waited doing nothing for more
than 30 seconds before the next "batch" of work came in and started running.

Likewise it also has relatively long periods with all 8 cores running at
or near idle. There are no jobs failing or obvious errors in the
TaskTracker log.

What could be causing this?

Should I increase the number of map jobs to greater than number of cores
to try and keep it busier?

-Terry

Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.
Running 1.0.2, in this case on Linux.

I was watching the processes / loads on one TaskTracker instance and
noticed that it completed it's first 8 map tasks and reported 8 free
slots (the max for this system). It then waited doing nothing for more
than 30 seconds before the next "batch" of work came in and started running.

Likewise it also has relatively long periods with all 8 cores running at
or near idle. There are no jobs failing or obvious errors in the
TaskTracker log.

What could be causing this?

Should I increase the number of map jobs to greater than number of cores
to try and keep it busier?

-Terry

Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.
Running 1.0.2, in this case on Linux.

I was watching the processes / loads on one TaskTracker instance and
noticed that it completed it's first 8 map tasks and reported 8 free
slots (the max for this system). It then waited doing nothing for more
than 30 seconds before the next "batch" of work came in and started running.

Likewise it also has relatively long periods with all 8 cores running at
or near idle. There are no jobs failing or obvious errors in the
TaskTracker log.

What could be causing this?

Should I increase the number of map jobs to greater than number of cores
to try and keep it busier?

-Terry

Delays in worker node jobs

Posted by Terry Healy <th...@bnl.gov>.
Running 1.0.2, in this case on Linux.

I was watching the processes / loads on one TaskTracker instance and
noticed that it completed it's first 8 map tasks and reported 8 free
slots (the max for this system). It then waited doing nothing for more
than 30 seconds before the next "batch" of work came in and started running.

Likewise it also has relatively long periods with all 8 cores running at
or near idle. There are no jobs failing or obvious errors in the
TaskTracker log.

What could be causing this?

Should I increase the number of map jobs to greater than number of cores
to try and keep it busier?

-Terry

Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
Thanks, I should of been more clear.  I am not attempting to perform a map reduce job.  I was literally trying to use the FileSystem abstraction (rather than using jets3t library directly to access S3.  I was assuming it handled the mocking of directories in s3 (as it is not a native feature of that file system).

C
On Aug 28, 2012, at 10:16 AM, Manoj Babu <ma...@gmail.com> wrote:

> Hi,
> 
> Here is an example, might help you.
> 
> http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html 
> 
> Cheers!
> Manoj.
> 
> 
> 
> On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins <ch...@yahoo.com> wrote:
> 
> 
> 
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C
> 
> 


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
Thanks, I should of been more clear.  I am not attempting to perform a map reduce job.  I was literally trying to use the FileSystem abstraction (rather than using jets3t library directly to access S3.  I was assuming it handled the mocking of directories in s3 (as it is not a native feature of that file system).

C
On Aug 28, 2012, at 10:16 AM, Manoj Babu <ma...@gmail.com> wrote:

> Hi,
> 
> Here is an example, might help you.
> 
> http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html 
> 
> Cheers!
> Manoj.
> 
> 
> 
> On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins <ch...@yahoo.com> wrote:
> 
> 
> 
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C
> 
> 


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
Thanks, I should of been more clear.  I am not attempting to perform a map reduce job.  I was literally trying to use the FileSystem abstraction (rather than using jets3t library directly to access S3.  I was assuming it handled the mocking of directories in s3 (as it is not a native feature of that file system).

C
On Aug 28, 2012, at 10:16 AM, Manoj Babu <ma...@gmail.com> wrote:

> Hi,
> 
> Here is an example, might help you.
> 
> http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html 
> 
> Cheers!
> Manoj.
> 
> 
> 
> On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins <ch...@yahoo.com> wrote:
> 
> 
> 
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C
> 
> 


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
Thanks, I should of been more clear.  I am not attempting to perform a map reduce job.  I was literally trying to use the FileSystem abstraction (rather than using jets3t library directly to access S3.  I was assuming it handled the mocking of directories in s3 (as it is not a native feature of that file system).

C
On Aug 28, 2012, at 10:16 AM, Manoj Babu <ma...@gmail.com> wrote:

> Hi,
> 
> Here is an example, might help you.
> 
> http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html 
> 
> Cheers!
> Manoj.
> 
> 
> 
> On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins <ch...@yahoo.com> wrote:
> 
> 
> 
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C
> 
> 


Re: example usage of s3 file system

Posted by Manoj Babu <ma...@gmail.com>.
Hi,

Here is an example, might help you.

http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html


Cheers!
Manoj.



On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins
<ch...@yahoo.com>wrote:

>
>
>
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my
> tinkering I am not having a great deal of success.  I am particularly
> interested in the ability to mimic a directory structure (since s3 native
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with
> s3?
>
> I created a few directories using transit and AWS S3 console for test.
>  Doing a liststatus of the bucket returns a FileStatus object of the
> directory created but if I try to do a liststatus of that path I am getting
> a 404:
>
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host
> ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C
>
>

Re: example usage of s3 file system

Posted by Manoj Babu <ma...@gmail.com>.
Hi,

Here is an example, might help you.

http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html


Cheers!
Manoj.



On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins
<ch...@yahoo.com>wrote:

>
>
>
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my
> tinkering I am not having a great deal of success.  I am particularly
> interested in the ability to mimic a directory structure (since s3 native
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with
> s3?
>
> I created a few directories using transit and AWS S3 console for test.
>  Doing a liststatus of the bucket returns a FileStatus object of the
> directory created but if I try to do a liststatus of that path I am getting
> a 404:
>
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host
> ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C
>
>

Re: example usage of s3 file system

Posted by Manoj Babu <ma...@gmail.com>.
Hi,

Here is an example, might help you.

http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html


Cheers!
Manoj.



On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins
<ch...@yahoo.com>wrote:

>
>
>
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my
> tinkering I am not having a great deal of success.  I am particularly
> interested in the ability to mimic a directory structure (since s3 native
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with
> s3?
>
> I created a few directories using transit and AWS S3 console for test.
>  Doing a liststatus of the bucket returns a FileStatus object of the
> directory created but if I try to do a liststatus of that path I am getting
> a 404:
>
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host
> ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C
>
>

Re: example usage of s3 file system

Posted by Manoj Babu <ma...@gmail.com>.
Hi,

Here is an example, might help you.

http://muhammadkhojaye.blogspot.in/2012/04/how-to-run-amazon-elastic-mapreduce-job.html


Cheers!
Manoj.



On Tue, Aug 28, 2012 at 12:55 PM, Chris Collins
<ch...@yahoo.com>wrote:

>
>
>
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my
> tinkering I am not having a great deal of success.  I am particularly
> interested in the ability to mimic a directory structure (since s3 native
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with
> s3?
>
> I created a few directories using transit and AWS S3 console for test.
>  Doing a liststatus of the bucket returns a FileStatus object of the
> directory created but if I try to do a liststatus of that path I am getting
> a 404:
>
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host
> ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C
>
>

example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.


Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).

Can anyone point me to some good example usage of Hadoop FileSystem with s3?

I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....

Probably not the best list to look for help, any clues appreciated.

C


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
No problem.  Didnt get anything from the issues email alias.  Seems that also the url that Haavard pointed to probably also needs to say what jets3t jar file to include.  People probably have a tendency to want to use the latest version (which is 9) where you seem to have to use 7.x to get hadoop not to give a method invocation exception.  I am guessing it cant be that important though cause for the life of me I cant understand how it ever worked for native (well at least doing a mkdir).

C
On Aug 29, 2012, at 5:48 AM, Harsh J <ha...@cloudera.com> wrote:

> Many thanks for taking this upstream with a fix Chris!
> 
> On Wed, Aug 29, 2012 at 12:51 PM, Chris Collins
> <ch...@yahoo.com> wrote:
>> Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.
>> 
>> I took a copy of the s3native package and made a fix and seems to get around the issue.
>> 
>> I have reported it to the issues email alias, I will see if there is actually any interest in this problem.
>> 
>> Cheers
>> 
>> C
>> On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:
>> 
>>> see also
>>> 
>>> http://wiki.apache.org/hadoop/AmazonS3
>>> 
>>> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
>>> <ch...@yahoo.com> wrote:
>>>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>>>> 
>>>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>>>> 
>>>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>>>> 
>>>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>>>> 
>>>> Probably not the best list to look for help, any clues appreciated.
>>>> 
>>>> C
>>> 
>>> 
>>> 
>>> --
>>> Håvard Wahl Kongsgård
>>> Faculty of Medicine &
>>> Department of Mathematical Sciences
>>> NTNU
>>> 
>>> http://havard.security-review.net/
>> 
> 
> 
> 
> -- 
> Harsh J


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
No problem.  Didnt get anything from the issues email alias.  Seems that also the url that Haavard pointed to probably also needs to say what jets3t jar file to include.  People probably have a tendency to want to use the latest version (which is 9) where you seem to have to use 7.x to get hadoop not to give a method invocation exception.  I am guessing it cant be that important though cause for the life of me I cant understand how it ever worked for native (well at least doing a mkdir).

C
On Aug 29, 2012, at 5:48 AM, Harsh J <ha...@cloudera.com> wrote:

> Many thanks for taking this upstream with a fix Chris!
> 
> On Wed, Aug 29, 2012 at 12:51 PM, Chris Collins
> <ch...@yahoo.com> wrote:
>> Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.
>> 
>> I took a copy of the s3native package and made a fix and seems to get around the issue.
>> 
>> I have reported it to the issues email alias, I will see if there is actually any interest in this problem.
>> 
>> Cheers
>> 
>> C
>> On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:
>> 
>>> see also
>>> 
>>> http://wiki.apache.org/hadoop/AmazonS3
>>> 
>>> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
>>> <ch...@yahoo.com> wrote:
>>>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>>>> 
>>>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>>>> 
>>>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>>>> 
>>>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>>>> 
>>>> Probably not the best list to look for help, any clues appreciated.
>>>> 
>>>> C
>>> 
>>> 
>>> 
>>> --
>>> Håvard Wahl Kongsgård
>>> Faculty of Medicine &
>>> Department of Mathematical Sciences
>>> NTNU
>>> 
>>> http://havard.security-review.net/
>> 
> 
> 
> 
> -- 
> Harsh J


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
No problem.  Didnt get anything from the issues email alias.  Seems that also the url that Haavard pointed to probably also needs to say what jets3t jar file to include.  People probably have a tendency to want to use the latest version (which is 9) where you seem to have to use 7.x to get hadoop not to give a method invocation exception.  I am guessing it cant be that important though cause for the life of me I cant understand how it ever worked for native (well at least doing a mkdir).

C
On Aug 29, 2012, at 5:48 AM, Harsh J <ha...@cloudera.com> wrote:

> Many thanks for taking this upstream with a fix Chris!
> 
> On Wed, Aug 29, 2012 at 12:51 PM, Chris Collins
> <ch...@yahoo.com> wrote:
>> Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.
>> 
>> I took a copy of the s3native package and made a fix and seems to get around the issue.
>> 
>> I have reported it to the issues email alias, I will see if there is actually any interest in this problem.
>> 
>> Cheers
>> 
>> C
>> On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:
>> 
>>> see also
>>> 
>>> http://wiki.apache.org/hadoop/AmazonS3
>>> 
>>> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
>>> <ch...@yahoo.com> wrote:
>>>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>>>> 
>>>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>>>> 
>>>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>>>> 
>>>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>>>> 
>>>> Probably not the best list to look for help, any clues appreciated.
>>>> 
>>>> C
>>> 
>>> 
>>> 
>>> --
>>> Håvard Wahl Kongsgård
>>> Faculty of Medicine &
>>> Department of Mathematical Sciences
>>> NTNU
>>> 
>>> http://havard.security-review.net/
>> 
> 
> 
> 
> -- 
> Harsh J


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
No problem.  Didnt get anything from the issues email alias.  Seems that also the url that Haavard pointed to probably also needs to say what jets3t jar file to include.  People probably have a tendency to want to use the latest version (which is 9) where you seem to have to use 7.x to get hadoop not to give a method invocation exception.  I am guessing it cant be that important though cause for the life of me I cant understand how it ever worked for native (well at least doing a mkdir).

C
On Aug 29, 2012, at 5:48 AM, Harsh J <ha...@cloudera.com> wrote:

> Many thanks for taking this upstream with a fix Chris!
> 
> On Wed, Aug 29, 2012 at 12:51 PM, Chris Collins
> <ch...@yahoo.com> wrote:
>> Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.
>> 
>> I took a copy of the s3native package and made a fix and seems to get around the issue.
>> 
>> I have reported it to the issues email alias, I will see if there is actually any interest in this problem.
>> 
>> Cheers
>> 
>> C
>> On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:
>> 
>>> see also
>>> 
>>> http://wiki.apache.org/hadoop/AmazonS3
>>> 
>>> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
>>> <ch...@yahoo.com> wrote:
>>>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>>>> 
>>>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>>>> 
>>>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>>>> 
>>>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>>>> 
>>>> Probably not the best list to look for help, any clues appreciated.
>>>> 
>>>> C
>>> 
>>> 
>>> 
>>> --
>>> Håvard Wahl Kongsgård
>>> Faculty of Medicine &
>>> Department of Mathematical Sciences
>>> NTNU
>>> 
>>> http://havard.security-review.net/
>> 
> 
> 
> 
> -- 
> Harsh J


Re: example usage of s3 file system

Posted by Harsh J <ha...@cloudera.com>.
Many thanks for taking this upstream with a fix Chris!

On Wed, Aug 29, 2012 at 12:51 PM, Chris Collins
<ch...@yahoo.com> wrote:
> Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.
>
> I took a copy of the s3native package and made a fix and seems to get around the issue.
>
> I have reported it to the issues email alias, I will see if there is actually any interest in this problem.
>
> Cheers
>
> C
> On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:
>
>> see also
>>
>> http://wiki.apache.org/hadoop/AmazonS3
>>
>> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
>> <ch...@yahoo.com> wrote:
>>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>>>
>>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>>>
>>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>>>
>>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>>>
>>> Probably not the best list to look for help, any clues appreciated.
>>>
>>> C
>>
>>
>>
>> --
>> Håvard Wahl Kongsgård
>> Faculty of Medicine &
>> Department of Mathematical Sciences
>> NTNU
>>
>> http://havard.security-review.net/
>



-- 
Harsh J

Re: example usage of s3 file system

Posted by Harsh J <ha...@cloudera.com>.
Many thanks for taking this upstream with a fix Chris!

On Wed, Aug 29, 2012 at 12:51 PM, Chris Collins
<ch...@yahoo.com> wrote:
> Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.
>
> I took a copy of the s3native package and made a fix and seems to get around the issue.
>
> I have reported it to the issues email alias, I will see if there is actually any interest in this problem.
>
> Cheers
>
> C
> On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:
>
>> see also
>>
>> http://wiki.apache.org/hadoop/AmazonS3
>>
>> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
>> <ch...@yahoo.com> wrote:
>>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>>>
>>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>>>
>>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>>>
>>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>>>
>>> Probably not the best list to look for help, any clues appreciated.
>>>
>>> C
>>
>>
>>
>> --
>> Håvard Wahl Kongsgård
>> Faculty of Medicine &
>> Department of Mathematical Sciences
>> NTNU
>>
>> http://havard.security-review.net/
>



-- 
Harsh J

Re: example usage of s3 file system

Posted by Harsh J <ha...@cloudera.com>.
Many thanks for taking this upstream with a fix Chris!

On Wed, Aug 29, 2012 at 12:51 PM, Chris Collins
<ch...@yahoo.com> wrote:
> Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.
>
> I took a copy of the s3native package and made a fix and seems to get around the issue.
>
> I have reported it to the issues email alias, I will see if there is actually any interest in this problem.
>
> Cheers
>
> C
> On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:
>
>> see also
>>
>> http://wiki.apache.org/hadoop/AmazonS3
>>
>> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
>> <ch...@yahoo.com> wrote:
>>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>>>
>>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>>>
>>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>>>
>>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>>>
>>> Probably not the best list to look for help, any clues appreciated.
>>>
>>> C
>>
>>
>>
>> --
>> Håvard Wahl Kongsgård
>> Faculty of Medicine &
>> Department of Mathematical Sciences
>> NTNU
>>
>> http://havard.security-review.net/
>



-- 
Harsh J

Re: example usage of s3 file system

Posted by Harsh J <ha...@cloudera.com>.
Many thanks for taking this upstream with a fix Chris!

On Wed, Aug 29, 2012 at 12:51 PM, Chris Collins
<ch...@yahoo.com> wrote:
> Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.
>
> I took a copy of the s3native package and made a fix and seems to get around the issue.
>
> I have reported it to the issues email alias, I will see if there is actually any interest in this problem.
>
> Cheers
>
> C
> On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:
>
>> see also
>>
>> http://wiki.apache.org/hadoop/AmazonS3
>>
>> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
>> <ch...@yahoo.com> wrote:
>>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>>>
>>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>>>
>>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>>>
>>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>>>
>>> Probably not the best list to look for help, any clues appreciated.
>>>
>>> C
>>
>>
>>
>> --
>> Håvard Wahl Kongsgård
>> Faculty of Medicine &
>> Department of Mathematical Sciences
>> NTNU
>>
>> http://havard.security-review.net/
>



-- 
Harsh J

Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.

I took a copy of the s3native package and made a fix and seems to get around the issue.

I have reported it to the issues email alias, I will see if there is actually any interest in this problem.

Cheers

C
On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:

> see also
> 
> http://wiki.apache.org/hadoop/AmazonS3
> 
> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
> <ch...@yahoo.com> wrote:
>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>> 
>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>> 
>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>> 
>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>> 
>> Probably not the best list to look for help, any clues appreciated.
>> 
>> C
> 
> 
> 
> -- 
> Håvard Wahl Kongsgård
> Faculty of Medicine &
> Department of Mathematical Sciences
> NTNU
> 
> http://havard.security-review.net/


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.

I took a copy of the s3native package and made a fix and seems to get around the issue.

I have reported it to the issues email alias, I will see if there is actually any interest in this problem.

Cheers

C
On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:

> see also
> 
> http://wiki.apache.org/hadoop/AmazonS3
> 
> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
> <ch...@yahoo.com> wrote:
>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>> 
>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>> 
>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>> 
>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>> 
>> Probably not the best list to look for help, any clues appreciated.
>> 
>> C
> 
> 
> 
> -- 
> Håvard Wahl Kongsgård
> Faculty of Medicine &
> Department of Mathematical Sciences
> NTNU
> 
> http://havard.security-review.net/


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.

I took a copy of the s3native package and made a fix and seems to get around the issue.

I have reported it to the issues email alias, I will see if there is actually any interest in this problem.

Cheers

C
On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:

> see also
> 
> http://wiki.apache.org/hadoop/AmazonS3
> 
> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
> <ch...@yahoo.com> wrote:
>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>> 
>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>> 
>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>> 
>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>> 
>> Probably not the best list to look for help, any clues appreciated.
>> 
>> C
> 
> 
> 
> -- 
> Håvard Wahl Kongsgård
> Faculty of Medicine &
> Department of Mathematical Sciences
> NTNU
> 
> http://havard.security-review.net/


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
Thanks Haavard, I am aware of that page but I am not sure why you are pointing me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is parsing a response from jets3t.  its looking for ResponseCode=404 but actually getting ResponseCode: 404.  I dont see how it ever worked looking back through the versions of release code.

I took a copy of the s3native package and made a fix and seems to get around the issue.

I have reported it to the issues email alias, I will see if there is actually any interest in this problem.

Cheers

C
On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård <ha...@gmail.com> wrote:

> see also
> 
> http://wiki.apache.org/hadoop/AmazonS3
> 
> On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
> <ch...@yahoo.com> wrote:
>> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>> 
>> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>> 
>> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>> 
>> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>> 
>> Probably not the best list to look for help, any clues appreciated.
>> 
>> C
> 
> 
> 
> -- 
> Håvard Wahl Kongsgård
> Faculty of Medicine &
> Department of Mathematical Sciences
> NTNU
> 
> http://havard.security-review.net/


Re: example usage of s3 file system

Posted by Håvard Wahl Kongsgård <ha...@gmail.com>.
see also

http://wiki.apache.org/hadoop/AmazonS3

On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
<ch...@yahoo.com> wrote:
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/

Re: example usage of s3 file system

Posted by Håvard Wahl Kongsgård <ha...@gmail.com>.
see also

http://wiki.apache.org/hadoop/AmazonS3

On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
<ch...@yahoo.com> wrote:
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/

Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
So looking at the source and single stepping through a simple test case I have to make a directory using FileSystem.mkdir()

I see that I am getting a 404 (nothing new there).  However the code that is throwing this looks like this below.  Note the comment about being "brittle".  Seems its looking for ResponseCode=404 except in this case I am getting "ResponseCode: 404"

Should I just forward this to the developer list or has anyone come over this before?

public FileMetadata retrieveMetadata(String key) throws IOException {
    try {
      S3Object object = s3Service.getObjectDetails(bucket, key);
      return new FileMetadata(key, object.getContentLength(),
          object.getLastModifiedDate().getTime());
    } catch (S3ServiceException e) {
      // Following is brittle. Is there a better way?
      if (e.getMessage().contains("ResponseCode=404")) {
        return null;
      }
      if (e.getCause() instanceof IOException) {
        throw (IOException) e.getCause();
      }
      throw new S3Exception(e);
    }
  }




On Aug 28, 2012, at 12:14 AM, Chris Collins <ch...@yahoo.com> wrote:

> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
So looking at the source and single stepping through a simple test case I have to make a directory using FileSystem.mkdir()

I see that I am getting a 404 (nothing new there).  However the code that is throwing this looks like this below.  Note the comment about being "brittle".  Seems its looking for ResponseCode=404 except in this case I am getting "ResponseCode: 404"

Should I just forward this to the developer list or has anyone come over this before?

public FileMetadata retrieveMetadata(String key) throws IOException {
    try {
      S3Object object = s3Service.getObjectDetails(bucket, key);
      return new FileMetadata(key, object.getContentLength(),
          object.getLastModifiedDate().getTime());
    } catch (S3ServiceException e) {
      // Following is brittle. Is there a better way?
      if (e.getMessage().contains("ResponseCode=404")) {
        return null;
      }
      if (e.getCause() instanceof IOException) {
        throw (IOException) e.getCause();
      }
      throw new S3Exception(e);
    }
  }




On Aug 28, 2012, at 12:14 AM, Chris Collins <ch...@yahoo.com> wrote:

> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C


example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.


Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).

Can anyone point me to some good example usage of Hadoop FileSystem with s3?

I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....

Probably not the best list to look for help, any clues appreciated.

C


Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
So looking at the source and single stepping through a simple test case I have to make a directory using FileSystem.mkdir()

I see that I am getting a 404 (nothing new there).  However the code that is throwing this looks like this below.  Note the comment about being "brittle".  Seems its looking for ResponseCode=404 except in this case I am getting "ResponseCode: 404"

Should I just forward this to the developer list or has anyone come over this before?

public FileMetadata retrieveMetadata(String key) throws IOException {
    try {
      S3Object object = s3Service.getObjectDetails(bucket, key);
      return new FileMetadata(key, object.getContentLength(),
          object.getLastModifiedDate().getTime());
    } catch (S3ServiceException e) {
      // Following is brittle. Is there a better way?
      if (e.getMessage().contains("ResponseCode=404")) {
        return null;
      }
      if (e.getCause() instanceof IOException) {
        throw (IOException) e.getCause();
      }
      throw new S3Exception(e);
    }
  }




On Aug 28, 2012, at 12:14 AM, Chris Collins <ch...@yahoo.com> wrote:

> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C


example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.


Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).

Can anyone point me to some good example usage of Hadoop FileSystem with s3?

I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....

Probably not the best list to look for help, any clues appreciated.

C


example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.


Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).

Can anyone point me to some good example usage of Hadoop FileSystem with s3?

I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....

Probably not the best list to look for help, any clues appreciated.

C


Re: example usage of s3 file system

Posted by Håvard Wahl Kongsgård <ha...@gmail.com>.
see also

http://wiki.apache.org/hadoop/AmazonS3

On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
<ch...@yahoo.com> wrote:
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/

Re: example usage of s3 file system

Posted by Chris Collins <ch...@yahoo.com>.
So looking at the source and single stepping through a simple test case I have to make a directory using FileSystem.mkdir()

I see that I am getting a 404 (nothing new there).  However the code that is throwing this looks like this below.  Note the comment about being "brittle".  Seems its looking for ResponseCode=404 except in this case I am getting "ResponseCode: 404"

Should I just forward this to the developer list or has anyone come over this before?

public FileMetadata retrieveMetadata(String key) throws IOException {
    try {
      S3Object object = s3Service.getObjectDetails(bucket, key);
      return new FileMetadata(key, object.getContentLength(),
          object.getLastModifiedDate().getTime());
    } catch (S3ServiceException e) {
      // Following is brittle. Is there a better way?
      if (e.getMessage().contains("ResponseCode=404")) {
        return null;
      }
      if (e.getCause() instanceof IOException) {
        throw (IOException) e.getCause();
      }
      throw new S3Exception(e);
    }
  }




On Aug 28, 2012, at 12:14 AM, Chris Collins <ch...@yahoo.com> wrote:

> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
> 
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
> 
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
> 
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
> 
> Probably not the best list to look for help, any clues appreciated.
> 
> C


Re: example usage of s3 file system

Posted by Håvard Wahl Kongsgård <ha...@gmail.com>.
see also

http://wiki.apache.org/hadoop/AmazonS3

On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
<ch...@yahoo.com> wrote:
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my tinkering I am not having a great deal of success.  I am particularly interested in the ability to mimic a directory structure (since s3 native doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>
> I created a few directories using transit and AWS S3 console for test.  Doing a liststatus of the bucket returns a FileStatus object of the directory created but if I try to do a liststatus of that path I am getting a 404:
>
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/aaaa' on Host ....
>
> Probably not the best list to look for help, any clues appreciated.
>
> C



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/