You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Lincoln Ritter <li...@lincolnritter.com> on 2008/07/02 01:34:47 UTC

Namenode Exceptions with S3

Hello,

I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
configuration:

<property>
  <name>fs.default.name</name>
  <value>s3://$HDFS_BUCKET</value>
</property>

<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value>$AWS_ACCESS_KEY_ID</value>
</property>

<property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value>$AWS_SECRET_ACCESS_KEY</value>
</property>

on startup of the cluster with the bucket having no non-alphabetic
characters, I get:

2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
java.lang.RuntimeException: Not a host:port pair: XXXXX
	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
	at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)

If I use this style of configuration:

<property>
  <name>fs.default.name</name>
  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
</property>

I get (where the all-caps portions are the actual values...):

2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
java.lang.NumberFormatException: For input string:
"AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
	at java.lang.Integer.parseInt(Integer.java:447)
	at java.lang.Integer.parseInt(Integer.java:497)
	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
	at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)

These exceptions are taken from the namenode log.  The datanode logs
show the same exceptions.

Other than the above configuration changes, the configuration is
identical to that generate by the hadoop image creation script found
in the 0.17.0 distribution.

Can anybody point me in the right direction here?

-lincoln

--
lincolnritter.com

Re: Namenode Exceptions with S3

Posted by Tom White <to...@gmail.com>.

On Thu, Jul 17, 2008 at 6:16 PM, Doug Cutting <cu...@apache.org> wrote:
> Can't one work around this by using a different configuration on the client
> than on the namenodes and datanodes?  The client should be able to set
> fs.default.name to an s3: uri, while the namenode and datanode must have it
> set to an hdfs: uri, no?

Yes, that's a good solution.

>> It might be less confusing if the HDFS daemons didn't use
>> fs.default.name to define the namenode host and port. Just like
>> mapred.job.tracker defines the host and port for the jobtracker,
>> dfs.namenode.address (or similar) could define the namenode. Would
>> this be a good change to make?
>
> Probably.  For back-compatibility we could leave it empty by default,
> deferring to fs.default.name, only if folks specify a non-empty
> dfs.namenode.address would it be used.

I've opened https://issues.apache.org/jira/browse/HADOOP-3782 for this.

Tom

Re: Namenode Exceptions with S3

Posted by Doug Cutting <cu...@apache.org>.

Tom White wrote:
> You can allow S3 as the default FS, it's just that then you can't run
> HDFS at all in this case. You would only do this if you don't want to
> use HDFS at all, for example, if you were running a MapReduce job
> which read from S3 and wrote to S3.

Can't one work around this by using a different configuration on the 
client than on the namenodes and datanodes?  The client should be able 
to set fs.default.name to an s3: uri, while the namenode and datanode 
must have it set to an hdfs: uri, no?

Would it be useful to add command-line options to namenode and datanode 
that override the configuration, so that one could start non-default 
HDFS daemons?

> It might be less confusing if the HDFS daemons didn't use
> fs.default.name to define the namenode host and port. Just like
> mapred.job.tracker defines the host and port for the jobtracker,
> dfs.namenode.address (or similar) could define the namenode. Would
> this be a good change to make?

Probably.  For back-compatibility we could leave it empty by default, 
deferring to fs.default.name, only if folks specify a non-empty 
dfs.namenode.address would it be used.

Doug

Re: Namenode Exceptions with S3

Posted by slitz <sl...@gmail.com>.

Sorry for the time taken to respond, i've been doing some tests on this.
Your workaround worked like a charm, thank you :) now i'm able to fetch the
data from S3 process using HDFS and put the results in S3.

about the a) problem that i mentioned in my previous email, now i understood
the error, i was starting namenode and datanodes and changing
fs.default.name to s3://bucket/ after that, now i understand why it doesn't
work.

Thank you *very* much for your help, now i can use EC2 and S3 :)

slitz

On Fri, Jul 11, 2008 at 10:46 PM, Tom White <to...@gmail.com> wrote:

> On Fri, Jul 11, 2008 at 9:09 PM, slitz <sl...@gmail.com> wrote:
> > a) Use S3 only, without HDFS and configuring fs.default.name as
> s3://bucket
> >  -> PROBLEM: we are getting ERROR org.apache.hadoop.dfs.NameNode:
> > java.lang.RuntimeException: Not a host:port pair: XXXXX
>
> What command are you using to start Hadoop?
>
> > b) Use HDFS as the default FS, specifying S3 only as input for the first
> Job
> > and output for the last(assuming one has multiple jobs on same data)
> >  -> PROBLEM: https://issues.apache.org/jira/browse/HADOOP-3733
>
> Yes, this is a problem. I've added a comment to the Jira description
> describing a workaround.
>
> Tom
>

Re: Namenode Exceptions with S3

Posted by Tom White <to...@gmail.com>.

On Fri, Jul 11, 2008 at 9:09 PM, slitz <sl...@gmail.com> wrote:
> a) Use S3 only, without HDFS and configuring fs.default.name as s3://bucket
>  -> PROBLEM: we are getting ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.RuntimeException: Not a host:port pair: XXXXX

What command are you using to start Hadoop?

> b) Use HDFS as the default FS, specifying S3 only as input for the first Job
> and output for the last(assuming one has multiple jobs on same data)
>  -> PROBLEM: https://issues.apache.org/jira/browse/HADOOP-3733

Yes, this is a problem. I've added a comment to the Jira description
describing a workaround.

Tom

Re: Namenode Exceptions with S3

Posted by slitz <sl...@gmail.com>.

I've been learning a lot from this thread, and Tom just helped me
understanding some things about S3 and HDFS, thank you.
To wrap everything up, if we want to use S3 with EC2 we can:

a) Use S3 only, without HDFS and configuring fs.default.name as s3://bucket
  -> PROBLEM: we are getting ERROR org.apache.hadoop.dfs.NameNode:
java.lang.RuntimeException: Not a host:port pair: XXXXX
b) Use HDFS as the default FS, specifying S3 only as input for the first Job
and output for the last(assuming one has multiple jobs on same data)
  -> PROBLEM: https://issues.apache.org/jira/browse/HADOOP-3733


So, in my case i cannot use S3 at all for now because of these 2 problems.
Any advice?

slitz

On Fri, Jul 11, 2008 at 4:31 PM, Lincoln Ritter <li...@lincolnritter.com>
wrote:

> Thanks Tom!
>
> Your explanation makes things a lot clearer.  I think that changing
> the 'fs.default.name' to something like 'dfs.namenode.address' would
> certainly be less confusing since it would clarify the purpose of
> these values.
>
> -lincoln
>
> --
> lincolnritter.com
>
>
>
> On Fri, Jul 11, 2008 at 4:21 AM, Tom White <to...@gmail.com> wrote:
> > On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter
> > <li...@lincolnritter.com> wrote:
> >> Thank you, Tom.
> >>
> >> Forgive me for being dense, but I don't understand your reply:
> >>
> >
> > Sorry! I'll try to explain it better (see below).
> >
> >>
> >> Do you mean that it is possible to use the Hadoop daemons with S3 but
> >> the default filesystem must be HDFS?
> >
> > The HDFS daemons use the value of "fs.default.name" to set the
> > namenode host and port, so if you set it to a s3 URI, you can't run
> > the HDFS daemons. So in this case you would use the start-mapred.sh
> > script instead of start-all.sh.
> >
> >> If that is the case, can I
> >> specify the output filesystem on a per-job basis and can that be an S3
> >> FS?
> >
> > Yes, that's exactly how you do it.
> >
> >>
> >> Also, is there a particular reason to not allow S3 as the default FS?
> >
> > You can allow S3 as the default FS, it's just that then you can't run
> > HDFS at all in this case. You would only do this if you don't want to
> > use HDFS at all, for example, if you were running a MapReduce job
> > which read from S3 and wrote to S3.
> >
> > It might be less confusing if the HDFS daemons didn't use
> > fs.default.name to define the namenode host and port. Just like
> > mapred.job.tracker defines the host and port for the jobtracker,
> > dfs.namenode.address (or similar) could define the namenode. Would
> > this be a good change to make?
> >
> > Tom
> >
>

Re: Namenode Exceptions with S3

Posted by Lincoln Ritter <li...@lincolnritter.com>.

Thanks Tom!

Your explanation makes things a lot clearer.  I think that changing
the 'fs.default.name' to something like 'dfs.namenode.address' would
certainly be less confusing since it would clarify the purpose of
these values.

-lincoln

--
lincolnritter.com



On Fri, Jul 11, 2008 at 4:21 AM, Tom White <to...@gmail.com> wrote:
> On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter
> <li...@lincolnritter.com> wrote:
>> Thank you, Tom.
>>
>> Forgive me for being dense, but I don't understand your reply:
>>
>
> Sorry! I'll try to explain it better (see below).
>
>>
>> Do you mean that it is possible to use the Hadoop daemons with S3 but
>> the default filesystem must be HDFS?
>
> The HDFS daemons use the value of "fs.default.name" to set the
> namenode host and port, so if you set it to a s3 URI, you can't run
> the HDFS daemons. So in this case you would use the start-mapred.sh
> script instead of start-all.sh.
>
>> If that is the case, can I
>> specify the output filesystem on a per-job basis and can that be an S3
>> FS?
>
> Yes, that's exactly how you do it.
>
>>
>> Also, is there a particular reason to not allow S3 as the default FS?
>
> You can allow S3 as the default FS, it's just that then you can't run
> HDFS at all in this case. You would only do this if you don't want to
> use HDFS at all, for example, if you were running a MapReduce job
> which read from S3 and wrote to S3.
>
> It might be less confusing if the HDFS daemons didn't use
> fs.default.name to define the namenode host and port. Just like
> mapred.job.tracker defines the host and port for the jobtracker,
> dfs.namenode.address (or similar) could define the namenode. Would
> this be a good change to make?
>
> Tom
>

Re: Namenode Exceptions with S3

Posted by Tom White <to...@gmail.com>.

On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter
<li...@lincolnritter.com> wrote:
> Thank you, Tom.
>
> Forgive me for being dense, but I don't understand your reply:
>

Sorry! I'll try to explain it better (see below).

>
> Do you mean that it is possible to use the Hadoop daemons with S3 but
> the default filesystem must be HDFS?

The HDFS daemons use the value of "fs.default.name" to set the
namenode host and port, so if you set it to a s3 URI, you can't run
the HDFS daemons. So in this case you would use the start-mapred.sh
script instead of start-all.sh.

> If that is the case, can I
> specify the output filesystem on a per-job basis and can that be an S3
> FS?

Yes, that's exactly how you do it.

>
> Also, is there a particular reason to not allow S3 as the default FS?

You can allow S3 as the default FS, it's just that then you can't run
HDFS at all in this case. You would only do this if you don't want to
use HDFS at all, for example, if you were running a MapReduce job
which read from S3 and wrote to S3.

It might be less confusing if the HDFS daemons didn't use
fs.default.name to define the namenode host and port. Just like
mapred.job.tracker defines the host and port for the jobtracker,
dfs.namenode.address (or similar) could define the namenode. Would
this be a good change to make?

Tom

Re: Namenode Exceptions with S3

Posted by Lincoln Ritter <li...@lincolnritter.com>.

Thank you, Tom.

Forgive me for being dense, but I don't understand your reply:

> If you make the default filesystem S3 then you can't run HDFS daemons.
> If you want to run HDFS and use an S3 filesystem, you need to make the
> default filesystem a hdfs URI, and use s3 URIs to reference S3
> filesystems.

Do you mean that it is possible to use the Hadoop daemons with S3 but
the default filesystem must be HDFS?  If that is the case, can I
specify the output filesystem on a per-job basis and can that be an S3
FS?

Also, is there a particular reason to not allow S3 as the default FS?

Thanks so much for your time!

-lincoln

--
lincolnritter.com



On Thu, Jul 10, 2008 at 1:55 PM, Tom White <to...@gmail.com> wrote:
>> I get (where the all-caps portions are the actual values...):
>>
>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>> java.lang.NumberFormatException: For input string:
>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>        at java.lang.Integer.parseInt(Integer.java:447)
>>        at java.lang.Integer.parseInt(Integer.java:497)
>>        at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>
>> These exceptions are taken from the namenode log.  The datanode logs
>> show the same exceptions.
>
> If you make the default filesystem S3 then you can't run HDFS daemons.
> If you want to run HDFS and use an S3 filesystem, you need to make the
> default filesystem a hdfs URI, and use s3 URIs to reference S3
> filesystems.
>
> Hope this helps.
>
> Tom
>

Re: Namenode Exceptions with S3

Posted by Tom White <to...@gmail.com>.

> I get (where the all-caps portions are the actual values...):
>
> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.NumberFormatException: For input string:
> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>        at java.lang.Integer.parseInt(Integer.java:447)
>        at java.lang.Integer.parseInt(Integer.java:497)
>        at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>
> These exceptions are taken from the namenode log.  The datanode logs
> show the same exceptions.

If you make the default filesystem S3 then you can't run HDFS daemons.
If you want to run HDFS and use an S3 filesystem, you need to make the
default filesystem a hdfs URI, and use s3 URIs to reference S3
filesystems.

Hope this helps.

Tom

Re: slash in AWS Secret Key, WAS Re: Namenode Exceptions with S3

Posted by Lincoln Ritter <li...@lincolnritter.com>.

Thanks for the reply.

I've heard the "regenerate" suggestion before, but for organizations
who show aws all over the place this is a huge pain.  I think it would
be better to come up with a more robust solution to handling aws info.

-lincoln

--
lincolnritter.com



On Wed, Jul 9, 2008 at 12:44 PM, Jimmy Lin <ji...@umd.edu> wrote:
> I've come across this problem before.  My simple solution was to
> regenerate new keys until I got one without a slash... ;)
>
> -Jimmy
>
>> I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').
>>
>> With distcp, I found that using the URL format s3://ID:SECRET@BUCKET/
>> did not work, even if I encoded the slash as "%2F".  I got
>> "org.jets3t.service.S3ServiceException: S3 HEAD request failed.
>> ResponseCode=403, ResponseMessage=Forbidden"
>>
>> When I put the AWS Secret Key in hadoop-site.xml and wrote the URL as
>> s3://BUCKET/ it worked.
>>
>> I have periods ('.') in my bucket name, that was not a problem.
>>
>> What's weird is that org.apache.hadoop.fs.s3.Jets3tFileSystemStore
>> uses java.net.URI, which should take take of unencoding the %2F.
>>
>> -Stuart
>>
>>
>> On Wed, Jul 9, 2008 at 1:41 PM, Lincoln Ritter
>> <li...@lincolnritter.com> wrote:
>>> So far, I've had no luck.
>>>
>>> Can anyone out there clarify the permissible characters/format for aws
>>> keys and bucket names?
>>>
>>> I haven't looked at the code here, but it seems strange to me that the
>>> same restrictions on host/port etc apply given that it's a totally
>>> different system.  I'd love to see exceptions thrown that are
>>> particular to the protocol/subsystem being employed.  The s3 'handler'
>>> (or whatever, might be nice enough to check for format violations and
>>> throw and appropriate exception, for instance.  It might URL-encode
>>> the secret key so that the user doesn't have to worry about this, or
>>> throw an exception notifying the user of a bad format.  Currently,
>>> apparent problems with my s3 settings are throwing exceptions that
>>> give no indication that the problem is actually with those settings.
>>>
>>> My mitigating strategy has been to change my configuration to use
>>> "instance-local" storage (/mnt).  I then copy the results out to s3
>>> using 'distcp'.  This is odd since distcp seems ok with my s3/aws
>>> info.
>>>
>>> I'm still unclear as to the permissible characters in bucket names and
>>> access keys.  I gather '/' is bad in the secret key and that '_' is
>>> bad for bucket names.  Thusfar i have only been able to get buckets to
>>> work in distcp that have only letters in their names, but I haven't
>>> tested to extensively.
>>>
>>> For example, I'd love to use buckets like:
>>> 'com.organization.hdfs.purpose'.  This seems to fail.  Using
>>> 'comorganizationhdfspurpose' works but clearly that is less than
>>> optimal.
>>>
>>> Like I say, I haven't dug into the source yet, but it is curious that
>>> distcp seems to work (at least where s3 is the destination) and hadoop
>>> fails when s3 is used as its storage.
>>>
>>> Anyone who has dealt with these issues, please post!  It will help
>>> make the project better.
>>>
>>> -lincoln
>>>
>>> --
>>> lincolnritter.com
>>>
>>>
>>>
>>> On Wed, Jul 9, 2008 at 7:10 AM, slitz <sl...@gmail.com> wrote:
>>>> I'm having the exact same problem, any tip?
>>>>
>>>> slitz
>>>>
>>>> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter
>>>> <li...@lincolnritter.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
>>>>> configuration:
>>>>>
>>>>> <property>
>>>>>  <name>fs.default.name</name>
>>>>>  <value>s3://$HDFS_BUCKET</value>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>  <name>fs.s3.awsAccessKeyId</name>
>>>>>  <value>$AWS_ACCESS_KEY_ID</value>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>  <name>fs.s3.awsSecretAccessKey</name>
>>>>>  <value>$AWS_SECRET_ACCESS_KEY</value>
>>>>> </property>
>>>>>
>>>>> on startup of the cluster with the bucket having no non-alphabetic
>>>>> characters, I get:
>>>>>
>>>>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>>>>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>>>>        at
>>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>>        at
>>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>>
>>>>> If I use this style of configuration:
>>>>>
>>>>> <property>
>>>>>  <name>fs.default.name</name>
>>>>>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
>>>>> </property>
>>>>>
>>>>> I get (where the all-caps portions are the actual values...):
>>>>>
>>>>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>>>>> java.lang.NumberFormatException: For input string:
>>>>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>>>>        at
>>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>>>        at java.lang.Integer.parseInt(Integer.java:447)
>>>>>        at java.lang.Integer.parseInt(Integer.java:497)
>>>>>        at
>>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>>        at
>>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>>
>>>>> These exceptions are taken from the namenode log.  The datanode logs
>>>>> show the same exceptions.
>>>>>
>>>>> Other than the above configuration changes, the configuration is
>>>>> identical to that generate by the hadoop image creation script found
>>>>> in the 0.17.0 distribution.
>>>>>
>>>>> Can anybody point me in the right direction here?
>>>>>
>>>>> -lincoln
>>>>>
>>>>> --
>>>>> lincolnritter.com
>>>>>
>>>>
>>>
>>
>>
>
>
>

slash in AWS Secret Key, WAS Re: Namenode Exceptions with S3

Posted by Jimmy Lin <ji...@umd.edu>.

I've come across this problem before.  My simple solution was to
regenerate new keys until I got one without a slash... ;)

-Jimmy

> I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').
>
> With distcp, I found that using the URL format s3://ID:SECRET@BUCKET/
> did not work, even if I encoded the slash as "%2F".  I got
> "org.jets3t.service.S3ServiceException: S3 HEAD request failed.
> ResponseCode=403, ResponseMessage=Forbidden"
>
> When I put the AWS Secret Key in hadoop-site.xml and wrote the URL as
> s3://BUCKET/ it worked.
>
> I have periods ('.') in my bucket name, that was not a problem.
>
> What's weird is that org.apache.hadoop.fs.s3.Jets3tFileSystemStore
> uses java.net.URI, which should take take of unencoding the %2F.
>
> -Stuart
>
>
> On Wed, Jul 9, 2008 at 1:41 PM, Lincoln Ritter
> <li...@lincolnritter.com> wrote:
>> So far, I've had no luck.
>>
>> Can anyone out there clarify the permissible characters/format for aws
>> keys and bucket names?
>>
>> I haven't looked at the code here, but it seems strange to me that the
>> same restrictions on host/port etc apply given that it's a totally
>> different system.  I'd love to see exceptions thrown that are
>> particular to the protocol/subsystem being employed.  The s3 'handler'
>> (or whatever, might be nice enough to check for format violations and
>> throw and appropriate exception, for instance.  It might URL-encode
>> the secret key so that the user doesn't have to worry about this, or
>> throw an exception notifying the user of a bad format.  Currently,
>> apparent problems with my s3 settings are throwing exceptions that
>> give no indication that the problem is actually with those settings.
>>
>> My mitigating strategy has been to change my configuration to use
>> "instance-local" storage (/mnt).  I then copy the results out to s3
>> using 'distcp'.  This is odd since distcp seems ok with my s3/aws
>> info.
>>
>> I'm still unclear as to the permissible characters in bucket names and
>> access keys.  I gather '/' is bad in the secret key and that '_' is
>> bad for bucket names.  Thusfar i have only been able to get buckets to
>> work in distcp that have only letters in their names, but I haven't
>> tested to extensively.
>>
>> For example, I'd love to use buckets like:
>> 'com.organization.hdfs.purpose'.  This seems to fail.  Using
>> 'comorganizationhdfspurpose' works but clearly that is less than
>> optimal.
>>
>> Like I say, I haven't dug into the source yet, but it is curious that
>> distcp seems to work (at least where s3 is the destination) and hadoop
>> fails when s3 is used as its storage.
>>
>> Anyone who has dealt with these issues, please post!  It will help
>> make the project better.
>>
>> -lincoln
>>
>> --
>> lincolnritter.com
>>
>>
>>
>> On Wed, Jul 9, 2008 at 7:10 AM, slitz <sl...@gmail.com> wrote:
>>> I'm having the exact same problem, any tip?
>>>
>>> slitz
>>>
>>> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter
>>> <li...@lincolnritter.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
>>>> configuration:
>>>>
>>>> <property>
>>>>  <name>fs.default.name</name>
>>>>  <value>s3://$HDFS_BUCKET</value>
>>>> </property>
>>>>
>>>> <property>
>>>>  <name>fs.s3.awsAccessKeyId</name>
>>>>  <value>$AWS_ACCESS_KEY_ID</value>
>>>> </property>
>>>>
>>>> <property>
>>>>  <name>fs.s3.awsSecretAccessKey</name>
>>>>  <value>$AWS_SECRET_ACCESS_KEY</value>
>>>> </property>
>>>>
>>>> on startup of the cluster with the bucket having no non-alphabetic
>>>> characters, I get:
>>>>
>>>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>>>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>>>        at
>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>        at
>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>
>>>> If I use this style of configuration:
>>>>
>>>> <property>
>>>>  <name>fs.default.name</name>
>>>>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
>>>> </property>
>>>>
>>>> I get (where the all-caps portions are the actual values...):
>>>>
>>>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>>>> java.lang.NumberFormatException: For input string:
>>>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>>>        at
>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>>        at java.lang.Integer.parseInt(Integer.java:447)
>>>>        at java.lang.Integer.parseInt(Integer.java:497)
>>>>        at
>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>        at
>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>
>>>> These exceptions are taken from the namenode log.  The datanode logs
>>>> show the same exceptions.
>>>>
>>>> Other than the above configuration changes, the configuration is
>>>> identical to that generate by the hadoop image creation script found
>>>> in the 0.17.0 distribution.
>>>>
>>>> Can anybody point me in the right direction here?
>>>>
>>>> -lincoln
>>>>
>>>> --
>>>> lincolnritter.com
>>>>
>>>
>>
>
>

Re: Namenode Exceptions with S3

Posted by Steve Loughran <st...@apache.org>.

Stuart Sierra wrote:
> I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').
> 
> With distcp, I found that using the URL format s3://ID:SECRET@BUCKET/
> did not work, even if I encoded the slash as "%2F".  I got
> "org.jets3t.service.S3ServiceException: S3 HEAD request failed.
> ResponseCode=403, ResponseMessage=Forbidden"
> 
> When I put the AWS Secret Key in hadoop-site.xml and wrote the URL as
> s3://BUCKET/ it worked.
> 
> I have periods ('.') in my bucket name, that was not a problem.
> 
> What's weird is that org.apache.hadoop.fs.s3.Jets3tFileSystemStore
> uses java.net.URI, which should take take of unencoding the %2F.

I've been using the Restlet API to work with S3, rather than JetSet; 
seems pretty good (i.e. it has the funny AWS authentication). The big 
problem I've found is that AWS auth requires the caller's clock to be 
close to Amazon's, and on VMWare-hosted images, the clock can drift 
enough for that to start failing.

Re: Namenode Exceptions with S3

Posted by Stuart Sierra <ma...@stuartsierra.com>.

I regenerated my AWS Secret Key to one that does not use a slash, and
I was able to successfully use the s3://ID:SECRET@BUCKET/ style URL
for distcp.  It seems the S3 FileSystem is not unencoding URLs
property.  I've filed a bug:
https://issues.apache.org/jira/browse/HADOOP-3733

-Stuart


On Wed, Jul 9, 2008 at 3:27 PM, Stuart Sierra <ma...@stuartsierra.com> wrote:
> I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').
>
> With distcp, I found that using the URL format s3://ID:SECRET@BUCKET/
> did not work, even if I encoded the slash as "%2F".  I got
> "org.jets3t.service.S3ServiceException: S3 HEAD request failed.
> ResponseCode=403, ResponseMessage=Forbidden"
>
> When I put the AWS Secret Key in hadoop-site.xml and wrote the URL as
> s3://BUCKET/ it worked.
>
> I have periods ('.') in my bucket name, that was not a problem.
>
> What's weird is that org.apache.hadoop.fs.s3.Jets3tFileSystemStore
> uses java.net.URI, which should take take of unencoding the %2F.
>
> -Stuart
>
>
> On Wed, Jul 9, 2008 at 1:41 PM, Lincoln Ritter
> <li...@lincolnritter.com> wrote:
>> So far, I've had no luck.
>>
>> Can anyone out there clarify the permissible characters/format for aws
>> keys and bucket names?
>>
>> I haven't looked at the code here, but it seems strange to me that the
>> same restrictions on host/port etc apply given that it's a totally
>> different system.  I'd love to see exceptions thrown that are
>> particular to the protocol/subsystem being employed.  The s3 'handler'
>> (or whatever, might be nice enough to check for format violations and
>> throw and appropriate exception, for instance.  It might URL-encode
>> the secret key so that the user doesn't have to worry about this, or
>> throw an exception notifying the user of a bad format.  Currently,
>> apparent problems with my s3 settings are throwing exceptions that
>> give no indication that the problem is actually with those settings.
>>
>> My mitigating strategy has been to change my configuration to use
>> "instance-local" storage (/mnt).  I then copy the results out to s3
>> using 'distcp'.  This is odd since distcp seems ok with my s3/aws
>> info.
>>
>> I'm still unclear as to the permissible characters in bucket names and
>> access keys.  I gather '/' is bad in the secret key and that '_' is
>> bad for bucket names.  Thusfar i have only been able to get buckets to
>> work in distcp that have only letters in their names, but I haven't
>> tested to extensively.
>>
>> For example, I'd love to use buckets like:
>> 'com.organization.hdfs.purpose'.  This seems to fail.  Using
>> 'comorganizationhdfspurpose' works but clearly that is less than
>> optimal.
>>
>> Like I say, I haven't dug into the source yet, but it is curious that
>> distcp seems to work (at least where s3 is the destination) and hadoop
>> fails when s3 is used as its storage.
>>
>> Anyone who has dealt with these issues, please post!  It will help
>> make the project better.
>>
>> -lincoln
>>
>> --
>> lincolnritter.com
>>
>>
>>
>> On Wed, Jul 9, 2008 at 7:10 AM, slitz <sl...@gmail.com> wrote:
>>> I'm having the exact same problem, any tip?
>>>
>>> slitz
>>>
>>> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter <li...@lincolnritter.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
>>>> configuration:
>>>>
>>>> <property>
>>>>  <name>fs.default.name</name>
>>>>  <value>s3://$HDFS_BUCKET</value>
>>>> </property>
>>>>
>>>> <property>
>>>>  <name>fs.s3.awsAccessKeyId</name>
>>>>  <value>$AWS_ACCESS_KEY_ID</value>
>>>> </property>
>>>>
>>>> <property>
>>>>  <name>fs.s3.awsSecretAccessKey</name>
>>>>  <value>$AWS_SECRET_ACCESS_KEY</value>
>>>> </property>
>>>>
>>>> on startup of the cluster with the bucket having no non-alphabetic
>>>> characters, I get:
>>>>
>>>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>>>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>>>        at
>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>
>>>> If I use this style of configuration:
>>>>
>>>> <property>
>>>>  <name>fs.default.name</name>
>>>>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
>>>> </property>
>>>>
>>>> I get (where the all-caps portions are the actual values...):
>>>>
>>>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>>>> java.lang.NumberFormatException: For input string:
>>>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>>>        at
>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>>        at java.lang.Integer.parseInt(Integer.java:447)
>>>>        at java.lang.Integer.parseInt(Integer.java:497)
>>>>        at
>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>
>>>> These exceptions are taken from the namenode log.  The datanode logs
>>>> show the same exceptions.
>>>>
>>>> Other than the above configuration changes, the configuration is
>>>> identical to that generate by the hadoop image creation script found
>>>> in the 0.17.0 distribution.
>>>>
>>>> Can anybody point me in the right direction here?
>>>>
>>>> -lincoln
>>>>
>>>> --
>>>> lincolnritter.com
>>>>
>>>
>>
>

Re: Namenode Exceptions with S3

Posted by Stuart Sierra <ma...@stuartsierra.com>.

I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').

With distcp, I found that using the URL format s3://ID:SECRET@BUCKET/
did not work, even if I encoded the slash as "%2F".  I got
"org.jets3t.service.S3ServiceException: S3 HEAD request failed.
ResponseCode=403, ResponseMessage=Forbidden"

When I put the AWS Secret Key in hadoop-site.xml and wrote the URL as
s3://BUCKET/ it worked.

I have periods ('.') in my bucket name, that was not a problem.

What's weird is that org.apache.hadoop.fs.s3.Jets3tFileSystemStore
uses java.net.URI, which should take take of unencoding the %2F.

-Stuart


On Wed, Jul 9, 2008 at 1:41 PM, Lincoln Ritter
<li...@lincolnritter.com> wrote:
> So far, I've had no luck.
>
> Can anyone out there clarify the permissible characters/format for aws
> keys and bucket names?
>
> I haven't looked at the code here, but it seems strange to me that the
> same restrictions on host/port etc apply given that it's a totally
> different system.  I'd love to see exceptions thrown that are
> particular to the protocol/subsystem being employed.  The s3 'handler'
> (or whatever, might be nice enough to check for format violations and
> throw and appropriate exception, for instance.  It might URL-encode
> the secret key so that the user doesn't have to worry about this, or
> throw an exception notifying the user of a bad format.  Currently,
> apparent problems with my s3 settings are throwing exceptions that
> give no indication that the problem is actually with those settings.
>
> My mitigating strategy has been to change my configuration to use
> "instance-local" storage (/mnt).  I then copy the results out to s3
> using 'distcp'.  This is odd since distcp seems ok with my s3/aws
> info.
>
> I'm still unclear as to the permissible characters in bucket names and
> access keys.  I gather '/' is bad in the secret key and that '_' is
> bad for bucket names.  Thusfar i have only been able to get buckets to
> work in distcp that have only letters in their names, but I haven't
> tested to extensively.
>
> For example, I'd love to use buckets like:
> 'com.organization.hdfs.purpose'.  This seems to fail.  Using
> 'comorganizationhdfspurpose' works but clearly that is less than
> optimal.
>
> Like I say, I haven't dug into the source yet, but it is curious that
> distcp seems to work (at least where s3 is the destination) and hadoop
> fails when s3 is used as its storage.
>
> Anyone who has dealt with these issues, please post!  It will help
> make the project better.
>
> -lincoln
>
> --
> lincolnritter.com
>
>
>
> On Wed, Jul 9, 2008 at 7:10 AM, slitz <sl...@gmail.com> wrote:
>> I'm having the exact same problem, any tip?
>>
>> slitz
>>
>> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter <li...@lincolnritter.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
>>> configuration:
>>>
>>> <property>
>>>  <name>fs.default.name</name>
>>>  <value>s3://$HDFS_BUCKET</value>
>>> </property>
>>>
>>> <property>
>>>  <name>fs.s3.awsAccessKeyId</name>
>>>  <value>$AWS_ACCESS_KEY_ID</value>
>>> </property>
>>>
>>> <property>
>>>  <name>fs.s3.awsSecretAccessKey</name>
>>>  <value>$AWS_SECRET_ACCESS_KEY</value>
>>> </property>
>>>
>>> on startup of the cluster with the bucket having no non-alphabetic
>>> characters, I get:
>>>
>>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>>        at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>
>>> If I use this style of configuration:
>>>
>>> <property>
>>>  <name>fs.default.name</name>
>>>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
>>> </property>
>>>
>>> I get (where the all-caps portions are the actual values...):
>>>
>>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>>> java.lang.NumberFormatException: For input string:
>>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>>        at
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>        at java.lang.Integer.parseInt(Integer.java:447)
>>>        at java.lang.Integer.parseInt(Integer.java:497)
>>>        at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>
>>> These exceptions are taken from the namenode log.  The datanode logs
>>> show the same exceptions.
>>>
>>> Other than the above configuration changes, the configuration is
>>> identical to that generate by the hadoop image creation script found
>>> in the 0.17.0 distribution.
>>>
>>> Can anybody point me in the right direction here?
>>>
>>> -lincoln
>>>
>>> --
>>> lincolnritter.com
>>>
>>
>

Re: Namenode Exceptions with S3

Posted by Lincoln Ritter <li...@lincolnritter.com>.

So far, I've had no luck.

Can anyone out there clarify the permissible characters/format for aws
keys and bucket names?

I haven't looked at the code here, but it seems strange to me that the
same restrictions on host/port etc apply given that it's a totally
different system.  I'd love to see exceptions thrown that are
particular to the protocol/subsystem being employed.  The s3 'handler'
(or whatever, might be nice enough to check for format violations and
throw and appropriate exception, for instance.  It might URL-encode
the secret key so that the user doesn't have to worry about this, or
throw an exception notifying the user of a bad format.  Currently,
apparent problems with my s3 settings are throwing exceptions that
give no indication that the problem is actually with those settings.

My mitigating strategy has been to change my configuration to use
"instance-local" storage (/mnt).  I then copy the results out to s3
using 'distcp'.  This is odd since distcp seems ok with my s3/aws
info.

I'm still unclear as to the permissible characters in bucket names and
access keys.  I gather '/' is bad in the secret key and that '_' is
bad for bucket names.  Thusfar i have only been able to get buckets to
work in distcp that have only letters in their names, but I haven't
tested to extensively.

For example, I'd love to use buckets like:
'com.organization.hdfs.purpose'.  This seems to fail.  Using
'comorganizationhdfspurpose' works but clearly that is less than
optimal.

Like I say, I haven't dug into the source yet, but it is curious that
distcp seems to work (at least where s3 is the destination) and hadoop
fails when s3 is used as its storage.

Anyone who has dealt with these issues, please post!  It will help
make the project better.

-lincoln

--
lincolnritter.com

On Wed, Jul 9, 2008 at 7:10 AM, slitz <sl...@gmail.com> wrote:
> I'm having the exact same problem, any tip?
>
> slitz
>
> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter <li...@lincolnritter.com>
> wrote:
>
>> Hello,
>>
>> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
>> configuration:
>>
>> <property>
>>  <name>fs.default.name</name>
>>  <value>s3://$HDFS_BUCKET</value>
>> </property>
>>
>> <property>
>>  <name>fs.s3.awsAccessKeyId</name>
>>  <value>$AWS_ACCESS_KEY_ID</value>
>> </property>
>>
>> <property>
>>  <name>fs.s3.awsSecretAccessKey</name>
>>  <value>$AWS_SECRET_ACCESS_KEY</value>
>> </property>
>>
>> on startup of the cluster with the bucket having no non-alphabetic
>> characters, I get:
>>
>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>        at
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>
>> If I use this style of configuration:
>>
>> <property>
>>  <name>fs.default.name</name>
>>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
>> </property>
>>
>> I get (where the all-caps portions are the actual values...):
>>
>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>> java.lang.NumberFormatException: For input string:
>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>        at
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>        at java.lang.Integer.parseInt(Integer.java:447)
>>        at java.lang.Integer.parseInt(Integer.java:497)
>>        at
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>
>> These exceptions are taken from the namenode log.  The datanode logs
>> show the same exceptions.
>>
>> Other than the above configuration changes, the configuration is
>> identical to that generate by the hadoop image creation script found
>> in the 0.17.0 distribution.
>>
>> Can anybody point me in the right direction here?
>>
>> -lincoln
>>
>> --
>> lincolnritter.com
>>
>

Re: Namenode Exceptions with S3

Posted by slitz <sl...@gmail.com>.

I'm having the exact same problem, any tip?

slitz

On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter <li...@lincolnritter.com>
wrote:

> Hello,
>
> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
> configuration:
>
> <property>
>  <name>fs.default.name</name>
>  <value>s3://$HDFS_BUCKET</value>
> </property>
>
> <property>
>  <name>fs.s3.awsAccessKeyId</name>
>  <value>$AWS_ACCESS_KEY_ID</value>
> </property>
>
> <property>
>  <name>fs.s3.awsSecretAccessKey</name>
>  <value>$AWS_SECRET_ACCESS_KEY</value>
> </property>
>
> on startup of the cluster with the bucket having no non-alphabetic
> characters, I get:
>
> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.RuntimeException: Not a host:port pair: XXXXX
>        at
> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>
> If I use this style of configuration:
>
> <property>
>  <name>fs.default.name</name>
>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
> </property>
>
> I get (where the all-caps portions are the actual values...):
>
> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.NumberFormatException: For input string:
> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>        at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>        at java.lang.Integer.parseInt(Integer.java:447)
>        at java.lang.Integer.parseInt(Integer.java:497)
>        at
> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>
> These exceptions are taken from the namenode log.  The datanode logs
> show the same exceptions.
>
> Other than the above configuration changes, the configuration is
> identical to that generate by the hadoop image creation script found
> in the 0.17.0 distribution.
>
> Can anybody point me in the right direction here?
>
> -lincoln
>
> --
> lincolnritter.com
>