You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Shahab Yunus <sh...@gmail.com> on 2013/04/29 14:52:47 UTC

VersionInfoAnnotation Unknown for Hadoop/HBase

Hello,

This might be something very obvious that I am missing but this has been
bugging me and I am unable to find what am I missing?

I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2
and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase
shell and hadoop commands.

When I give the following command:

'hbase version'

I get the following output which is correct and expected:
-----------------------
13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
13/04/29 07:47:42 INFO util.VersionInfo: Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
-r Unknown
13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov 1
18:01:09 PDT 2012

But when I I kick of the VersionInfo class manually (I do see that there is
a main method in there), I get an Unknown result? Why is that?
Command:
'java -cp
/usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
org.apache.hadoop.hbase.util.VersionInfo'

Output:
-----------------------
Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
logVersion
INFO: HBase Unknown
Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
logVersion
INFO: Subversion Unknown -r Unknown
Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
logVersion
INFO: Compiled by Unknown on Unknown

Now this is causing problems when I am trying to run my HBase client on
this machine as the it aborts with the following error:
-----------------------
java.lang.RuntimeException: hbase-default.xml file seems to be for and old
version of HBase (0.92.1-cdh4.1.2), this version is Unknown
at
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)

This means that the hbase-default.xml in the hbase jar is being picked up
but the version info captured/compiled through annotations is not? How is
it possible if 'hbase shell' (or hadoop version') works fine!

Please advise. Thanks a lot. I will be very grateful.

Regards,
Shahab

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Yes, this indeed seem to be the case. After running java -version and
seeing 1.5 it rung a bell because all our servers (as far as I knew) were
1.6 or above. So I never thought that this would be any issue!! But boy I
was wrong and it indeed turned out to be something so obvious. Thanks guys
for your prompt responses and help. I feel embarrassed to bother all for
such an issue :/

I ran all of these commands on machines which actually had Java 1.6 or 1.7
and they work.

Regards,
Shahab


On Mon, Apr 29, 2013 at 11:05 AM, Harsh J <ha...@cloudera.com> wrote:

> Well… Bingo! :)
>
> We don't write our projects for 1.5 JVMs, and especially not the GCJ
> (1.5 didn't have annotations either IIRC? We depend on that here). Try
> with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.
>
> On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
> > The output of "java -version" is:
> >
> > java -version
> > java version "1.5.0"
> > gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
> >
> > Copyright (C) 2007 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is
> NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> > --------------------------------------------------
> >
> > Also, when I run:
> >
> > "hbase org.apache.hadoop.hbase.util.VersionInfo"
> >
> > I do get the correct output:
> > 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> > 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> > -r Unknown
> > 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>  1
> > 18:01:09 PDT 2012
> >
> > This is strange and because of this I am unable to run my java client
> which
> > errores out as mentioned with the following:
> > java.lang.RuntimeException: hbase-default.xml file seems to be for and
> old
> > version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >    at
> >
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> This is rather odd and am unable to reproduce this across several
> >> versions. It may even be something to do with all that static loading
> >> done in the VersionInfo class but am unsure at the moment.
> >>
> >> What does "java -version" print for you?
> >>
> >> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
> >> wrote:
> >> > Okay, I think I know what you mean. Those were back ticks!
> >> >
> >> > So I tried the following:
> >> >
> >> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >> >
> >> > and I still get:
> >> >
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on
> Unknown
> >> >
> >> > I did print `hbase classpath` on the console itself and it does print
> >> > paths
> >> > to various libs and jars.
> >> >
> >> > Regards,
> >> > Shahab
> >> >
> >> >
> >> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> >> > wrote:
> >> >>
> >> >> Ted, Sorry I didn't understand. What do you mean exactly by
> "specifying
> >> >> `hbase classpath` "? You mean declare a environment variable
> >> >> 'HBASE_CLASSPATH'?
> >> >>
> >> >> Regards,
> >> >> Shaahb
> >> >>
> >> >>
> >> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com>
> wrote:
> >> >>>
> >> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >> >>>
> >> >>> Instead of hard coding class path, can you try specifying `hbase
> >> >>> classpath` ?
> >> >>>
> >> >>> Cheers
> >> >>>
> >> >>>
> >> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hello,
> >> >>>>
> >> >>>> This might be something very obvious that I am missing but this has
> >> >>>> been
> >> >>>> bugging me and I am unable to find what am I missing?
> >> >>>>
> >> >>>> I have hadoop and hbase installed on Linux machine. Version
> >> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
> and
> >> >>>> I can
> >> >>>> invoke hbase shell and hadoop commands.
> >> >>>>
> >> >>>> When I give the following command:
> >> >>>>
> >> >>>> 'hbase version'
> >> >>>>
> >> >>>> I get the following output which is correct and expected:
> >> >>>> -----------------------
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >> >>>>
> >> >>>>
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >> >>>> -r Unknown
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> >> >>>> Nov
> >> >>>> 1 18:01:09 PDT 2012
> >> >>>>
> >> >>>> But when I I kick of the VersionInfo class manually (I do see that
> >> >>>> there
> >> >>>> is a main method in there), I get an Unknown result? Why is that?
> >> >>>> Command:
> >> >>>> 'java  -cp
> >> >>>>
> >> >>>>
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> >> >>>> org.apache.hadoop.hbase.util.VersionInfo'
> >> >>>>
> >> >>>> Output:
> >> >>>> -----------------------
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: HBase Unknown
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: Subversion Unknown -r Unknown
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: Compiled by Unknown on Unknown
> >> >>>>
> >> >>>> Now this is causing problems when I am trying to run my HBase
> client
> >> >>>> on
> >> >>>> this machine as the it aborts with the following error:
> >> >>>> -----------------------
> >> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for
> >> >>>> and
> >> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >> >>>>    at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >> >>>>
> >> >>>> This means that the hbase-default.xml in the hbase jar is being
> >> >>>> picked
> >> >>>> up but the version info captured/compiled through annotations is
> not?
> >> >>>> How is
> >> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
> >> >>>>
> >> >>>> Please advise. Thanks a lot. I will be very grateful.
> >> >>>>
> >> >>>> Regards,
> >> >>>> Shahab
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Yes, this indeed seem to be the case. After running java -version and
seeing 1.5 it rung a bell because all our servers (as far as I knew) were
1.6 or above. So I never thought that this would be any issue!! But boy I
was wrong and it indeed turned out to be something so obvious. Thanks guys
for your prompt responses and help. I feel embarrassed to bother all for
such an issue :/

I ran all of these commands on machines which actually had Java 1.6 or 1.7
and they work.

Regards,
Shahab


On Mon, Apr 29, 2013 at 11:05 AM, Harsh J <ha...@cloudera.com> wrote:

> Well… Bingo! :)
>
> We don't write our projects for 1.5 JVMs, and especially not the GCJ
> (1.5 didn't have annotations either IIRC? We depend on that here). Try
> with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.
>
> On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
> > The output of "java -version" is:
> >
> > java -version
> > java version "1.5.0"
> > gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
> >
> > Copyright (C) 2007 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is
> NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> > --------------------------------------------------
> >
> > Also, when I run:
> >
> > "hbase org.apache.hadoop.hbase.util.VersionInfo"
> >
> > I do get the correct output:
> > 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> > 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> > -r Unknown
> > 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>  1
> > 18:01:09 PDT 2012
> >
> > This is strange and because of this I am unable to run my java client
> which
> > errores out as mentioned with the following:
> > java.lang.RuntimeException: hbase-default.xml file seems to be for and
> old
> > version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >    at
> >
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> This is rather odd and am unable to reproduce this across several
> >> versions. It may even be something to do with all that static loading
> >> done in the VersionInfo class but am unsure at the moment.
> >>
> >> What does "java -version" print for you?
> >>
> >> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
> >> wrote:
> >> > Okay, I think I know what you mean. Those were back ticks!
> >> >
> >> > So I tried the following:
> >> >
> >> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >> >
> >> > and I still get:
> >> >
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on
> Unknown
> >> >
> >> > I did print `hbase classpath` on the console itself and it does print
> >> > paths
> >> > to various libs and jars.
> >> >
> >> > Regards,
> >> > Shahab
> >> >
> >> >
> >> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> >> > wrote:
> >> >>
> >> >> Ted, Sorry I didn't understand. What do you mean exactly by
> "specifying
> >> >> `hbase classpath` "? You mean declare a environment variable
> >> >> 'HBASE_CLASSPATH'?
> >> >>
> >> >> Regards,
> >> >> Shaahb
> >> >>
> >> >>
> >> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com>
> wrote:
> >> >>>
> >> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >> >>>
> >> >>> Instead of hard coding class path, can you try specifying `hbase
> >> >>> classpath` ?
> >> >>>
> >> >>> Cheers
> >> >>>
> >> >>>
> >> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hello,
> >> >>>>
> >> >>>> This might be something very obvious that I am missing but this has
> >> >>>> been
> >> >>>> bugging me and I am unable to find what am I missing?
> >> >>>>
> >> >>>> I have hadoop and hbase installed on Linux machine. Version
> >> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
> and
> >> >>>> I can
> >> >>>> invoke hbase shell and hadoop commands.
> >> >>>>
> >> >>>> When I give the following command:
> >> >>>>
> >> >>>> 'hbase version'
> >> >>>>
> >> >>>> I get the following output which is correct and expected:
> >> >>>> -----------------------
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >> >>>>
> >> >>>>
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >> >>>> -r Unknown
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> >> >>>> Nov
> >> >>>> 1 18:01:09 PDT 2012
> >> >>>>
> >> >>>> But when I I kick of the VersionInfo class manually (I do see that
> >> >>>> there
> >> >>>> is a main method in there), I get an Unknown result? Why is that?
> >> >>>> Command:
> >> >>>> 'java  -cp
> >> >>>>
> >> >>>>
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> >> >>>> org.apache.hadoop.hbase.util.VersionInfo'
> >> >>>>
> >> >>>> Output:
> >> >>>> -----------------------
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: HBase Unknown
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: Subversion Unknown -r Unknown
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: Compiled by Unknown on Unknown
> >> >>>>
> >> >>>> Now this is causing problems when I am trying to run my HBase
> client
> >> >>>> on
> >> >>>> this machine as the it aborts with the following error:
> >> >>>> -----------------------
> >> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for
> >> >>>> and
> >> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >> >>>>    at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >> >>>>
> >> >>>> This means that the hbase-default.xml in the hbase jar is being
> >> >>>> picked
> >> >>>> up but the version info captured/compiled through annotations is
> not?
> >> >>>> How is
> >> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
> >> >>>>
> >> >>>> Please advise. Thanks a lot. I will be very grateful.
> >> >>>>
> >> >>>> Regards,
> >> >>>> Shahab
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Yes, this indeed seem to be the case. After running java -version and
seeing 1.5 it rung a bell because all our servers (as far as I knew) were
1.6 or above. So I never thought that this would be any issue!! But boy I
was wrong and it indeed turned out to be something so obvious. Thanks guys
for your prompt responses and help. I feel embarrassed to bother all for
such an issue :/

I ran all of these commands on machines which actually had Java 1.6 or 1.7
and they work.

Regards,
Shahab


On Mon, Apr 29, 2013 at 11:05 AM, Harsh J <ha...@cloudera.com> wrote:

> Well… Bingo! :)
>
> We don't write our projects for 1.5 JVMs, and especially not the GCJ
> (1.5 didn't have annotations either IIRC? We depend on that here). Try
> with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.
>
> On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
> > The output of "java -version" is:
> >
> > java -version
> > java version "1.5.0"
> > gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
> >
> > Copyright (C) 2007 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is
> NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> > --------------------------------------------------
> >
> > Also, when I run:
> >
> > "hbase org.apache.hadoop.hbase.util.VersionInfo"
> >
> > I do get the correct output:
> > 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> > 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> > -r Unknown
> > 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>  1
> > 18:01:09 PDT 2012
> >
> > This is strange and because of this I am unable to run my java client
> which
> > errores out as mentioned with the following:
> > java.lang.RuntimeException: hbase-default.xml file seems to be for and
> old
> > version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >    at
> >
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> This is rather odd and am unable to reproduce this across several
> >> versions. It may even be something to do with all that static loading
> >> done in the VersionInfo class but am unsure at the moment.
> >>
> >> What does "java -version" print for you?
> >>
> >> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
> >> wrote:
> >> > Okay, I think I know what you mean. Those were back ticks!
> >> >
> >> > So I tried the following:
> >> >
> >> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >> >
> >> > and I still get:
> >> >
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on
> Unknown
> >> >
> >> > I did print `hbase classpath` on the console itself and it does print
> >> > paths
> >> > to various libs and jars.
> >> >
> >> > Regards,
> >> > Shahab
> >> >
> >> >
> >> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> >> > wrote:
> >> >>
> >> >> Ted, Sorry I didn't understand. What do you mean exactly by
> "specifying
> >> >> `hbase classpath` "? You mean declare a environment variable
> >> >> 'HBASE_CLASSPATH'?
> >> >>
> >> >> Regards,
> >> >> Shaahb
> >> >>
> >> >>
> >> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com>
> wrote:
> >> >>>
> >> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >> >>>
> >> >>> Instead of hard coding class path, can you try specifying `hbase
> >> >>> classpath` ?
> >> >>>
> >> >>> Cheers
> >> >>>
> >> >>>
> >> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hello,
> >> >>>>
> >> >>>> This might be something very obvious that I am missing but this has
> >> >>>> been
> >> >>>> bugging me and I am unable to find what am I missing?
> >> >>>>
> >> >>>> I have hadoop and hbase installed on Linux machine. Version
> >> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
> and
> >> >>>> I can
> >> >>>> invoke hbase shell and hadoop commands.
> >> >>>>
> >> >>>> When I give the following command:
> >> >>>>
> >> >>>> 'hbase version'
> >> >>>>
> >> >>>> I get the following output which is correct and expected:
> >> >>>> -----------------------
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >> >>>>
> >> >>>>
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >> >>>> -r Unknown
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> >> >>>> Nov
> >> >>>> 1 18:01:09 PDT 2012
> >> >>>>
> >> >>>> But when I I kick of the VersionInfo class manually (I do see that
> >> >>>> there
> >> >>>> is a main method in there), I get an Unknown result? Why is that?
> >> >>>> Command:
> >> >>>> 'java  -cp
> >> >>>>
> >> >>>>
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> >> >>>> org.apache.hadoop.hbase.util.VersionInfo'
> >> >>>>
> >> >>>> Output:
> >> >>>> -----------------------
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: HBase Unknown
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: Subversion Unknown -r Unknown
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: Compiled by Unknown on Unknown
> >> >>>>
> >> >>>> Now this is causing problems when I am trying to run my HBase
> client
> >> >>>> on
> >> >>>> this machine as the it aborts with the following error:
> >> >>>> -----------------------
> >> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for
> >> >>>> and
> >> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >> >>>>    at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >> >>>>
> >> >>>> This means that the hbase-default.xml in the hbase jar is being
> >> >>>> picked
> >> >>>> up but the version info captured/compiled through annotations is
> not?
> >> >>>> How is
> >> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
> >> >>>>
> >> >>>> Please advise. Thanks a lot. I will be very grateful.
> >> >>>>
> >> >>>> Regards,
> >> >>>> Shahab
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Yes, this indeed seem to be the case. After running java -version and
seeing 1.5 it rung a bell because all our servers (as far as I knew) were
1.6 or above. So I never thought that this would be any issue!! But boy I
was wrong and it indeed turned out to be something so obvious. Thanks guys
for your prompt responses and help. I feel embarrassed to bother all for
such an issue :/

I ran all of these commands on machines which actually had Java 1.6 or 1.7
and they work.

Regards,
Shahab


On Mon, Apr 29, 2013 at 11:05 AM, Harsh J <ha...@cloudera.com> wrote:

> Well… Bingo! :)
>
> We don't write our projects for 1.5 JVMs, and especially not the GCJ
> (1.5 didn't have annotations either IIRC? We depend on that here). Try
> with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.
>
> On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
> > The output of "java -version" is:
> >
> > java -version
> > java version "1.5.0"
> > gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
> >
> > Copyright (C) 2007 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is
> NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> > --------------------------------------------------
> >
> > Also, when I run:
> >
> > "hbase org.apache.hadoop.hbase.util.VersionInfo"
> >
> > I do get the correct output:
> > 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> > 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> > -r Unknown
> > 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>  1
> > 18:01:09 PDT 2012
> >
> > This is strange and because of this I am unable to run my java client
> which
> > errores out as mentioned with the following:
> > java.lang.RuntimeException: hbase-default.xml file seems to be for and
> old
> > version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >    at
> >
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> This is rather odd and am unable to reproduce this across several
> >> versions. It may even be something to do with all that static loading
> >> done in the VersionInfo class but am unsure at the moment.
> >>
> >> What does "java -version" print for you?
> >>
> >> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
> >> wrote:
> >> > Okay, I think I know what you mean. Those were back ticks!
> >> >
> >> > So I tried the following:
> >> >
> >> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >> >
> >> > and I still get:
> >> >
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on
> Unknown
> >> >
> >> > I did print `hbase classpath` on the console itself and it does print
> >> > paths
> >> > to various libs and jars.
> >> >
> >> > Regards,
> >> > Shahab
> >> >
> >> >
> >> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> >> > wrote:
> >> >>
> >> >> Ted, Sorry I didn't understand. What do you mean exactly by
> "specifying
> >> >> `hbase classpath` "? You mean declare a environment variable
> >> >> 'HBASE_CLASSPATH'?
> >> >>
> >> >> Regards,
> >> >> Shaahb
> >> >>
> >> >>
> >> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com>
> wrote:
> >> >>>
> >> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >> >>>
> >> >>> Instead of hard coding class path, can you try specifying `hbase
> >> >>> classpath` ?
> >> >>>
> >> >>> Cheers
> >> >>>
> >> >>>
> >> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hello,
> >> >>>>
> >> >>>> This might be something very obvious that I am missing but this has
> >> >>>> been
> >> >>>> bugging me and I am unable to find what am I missing?
> >> >>>>
> >> >>>> I have hadoop and hbase installed on Linux machine. Version
> >> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
> and
> >> >>>> I can
> >> >>>> invoke hbase shell and hadoop commands.
> >> >>>>
> >> >>>> When I give the following command:
> >> >>>>
> >> >>>> 'hbase version'
> >> >>>>
> >> >>>> I get the following output which is correct and expected:
> >> >>>> -----------------------
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >> >>>>
> >> >>>>
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >> >>>> -r Unknown
> >> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> >> >>>> Nov
> >> >>>> 1 18:01:09 PDT 2012
> >> >>>>
> >> >>>> But when I I kick of the VersionInfo class manually (I do see that
> >> >>>> there
> >> >>>> is a main method in there), I get an Unknown result? Why is that?
> >> >>>> Command:
> >> >>>> 'java  -cp
> >> >>>>
> >> >>>>
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> >> >>>> org.apache.hadoop.hbase.util.VersionInfo'
> >> >>>>
> >> >>>> Output:
> >> >>>> -----------------------
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: HBase Unknown
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: Subversion Unknown -r Unknown
> >> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >> >>>> logVersion
> >> >>>> INFO: Compiled by Unknown on Unknown
> >> >>>>
> >> >>>> Now this is causing problems when I am trying to run my HBase
> client
> >> >>>> on
> >> >>>> this machine as the it aborts with the following error:
> >> >>>> -----------------------
> >> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for
> >> >>>> and
> >> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >> >>>>    at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >> >>>>
> >> >>>> This means that the hbase-default.xml in the hbase jar is being
> >> >>>> picked
> >> >>>> up but the version info captured/compiled through annotations is
> not?
> >> >>>> How is
> >> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
> >> >>>>
> >> >>>> Please advise. Thanks a lot. I will be very grateful.
> >> >>>>
> >> >>>> Regards,
> >> >>>> Shahab
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Harsh J <ha...@cloudera.com>.

Well… Bingo! :)

We don't write our projects for 1.5 JVMs, and especially not the GCJ
(1.5 didn't have annotations either IIRC? We depend on that here). Try
with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.

On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus <sh...@gmail.com> wrote:
> The output of "java -version" is:
>
> java -version
> java version "1.5.0"
> gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
>
> Copyright (C) 2007 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> --------------------------------------------------
>
> Also, when I run:
>
> "hbase org.apache.hadoop.hbase.util.VersionInfo"
>
> I do get the correct output:
> 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> This is strange and because of this I am unable to run my java client which
> errores out as mentioned with the following:
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>    at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> This is rather odd and am unable to reproduce this across several
>> versions. It may even be something to do with all that static loading
>> done in the VersionInfo class but am unsure at the moment.
>>
>> What does "java -version" print for you?
>>
>> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>> > Okay, I think I know what you mean. Those were back ticks!
>> >
>> > So I tried the following:
>> >
>> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>> >
>> > and I still get:
>> >
>> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>> >
>> > I did print `hbase classpath` on the console itself and it does print
>> > paths
>> > to various libs and jars.
>> >
>> > Regards,
>> > Shahab
>> >
>> >
>> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
>> > wrote:
>> >>
>> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> >> `hbase classpath` "? You mean declare a environment variable
>> >> 'HBASE_CLASSPATH'?
>> >>
>> >> Regards,
>> >> Shaahb
>> >>
>> >>
>> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>> >>>
>> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
>> >>>
>> >>> Instead of hard coding class path, can you try specifying `hbase
>> >>> classpath` ?
>> >>>
>> >>> Cheers
>> >>>
>> >>>
>> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hello,
>> >>>>
>> >>>> This might be something very obvious that I am missing but this has
>> >>>> been
>> >>>> bugging me and I am unable to find what am I missing?
>> >>>>
>> >>>> I have hadoop and hbase installed on Linux machine. Version
>> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
>> >>>> I can
>> >>>> invoke hbase shell and hadoop commands.
>> >>>>
>> >>>> When I give the following command:
>> >>>>
>> >>>> 'hbase version'
>> >>>>
>> >>>> I get the following output which is correct and expected:
>> >>>> -----------------------
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> >>>>
>> >>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> >>>> -r Unknown
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
>> >>>> Nov
>> >>>> 1 18:01:09 PDT 2012
>> >>>>
>> >>>> But when I I kick of the VersionInfo class manually (I do see that
>> >>>> there
>> >>>> is a main method in there), I get an Unknown result? Why is that?
>> >>>> Command:
>> >>>> 'java  -cp
>> >>>>
>> >>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> >>>> org.apache.hadoop.hbase.util.VersionInfo'
>> >>>>
>> >>>> Output:
>> >>>> -----------------------
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: HBase Unknown
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: Subversion Unknown -r Unknown
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: Compiled by Unknown on Unknown
>> >>>>
>> >>>> Now this is causing problems when I am trying to run my HBase client
>> >>>> on
>> >>>> this machine as the it aborts with the following error:
>> >>>> -----------------------
>> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for
>> >>>> and
>> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>> >>>>    at
>> >>>>
>> >>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>> >>>>
>> >>>> This means that the hbase-default.xml in the hbase jar is being
>> >>>> picked
>> >>>> up but the version info captured/compiled through annotations is not?
>> >>>> How is
>> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
>> >>>>
>> >>>> Please advise. Thanks a lot. I will be very grateful.
>> >>>>
>> >>>> Regards,
>> >>>> Shahab
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Harsh J <ha...@cloudera.com>.

Well… Bingo! :)

We don't write our projects for 1.5 JVMs, and especially not the GCJ
(1.5 didn't have annotations either IIRC? We depend on that here). Try
with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.

On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus <sh...@gmail.com> wrote:
> The output of "java -version" is:
>
> java -version
> java version "1.5.0"
> gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
>
> Copyright (C) 2007 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> --------------------------------------------------
>
> Also, when I run:
>
> "hbase org.apache.hadoop.hbase.util.VersionInfo"
>
> I do get the correct output:
> 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> This is strange and because of this I am unable to run my java client which
> errores out as mentioned with the following:
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>    at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> This is rather odd and am unable to reproduce this across several
>> versions. It may even be something to do with all that static loading
>> done in the VersionInfo class but am unsure at the moment.
>>
>> What does "java -version" print for you?
>>
>> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>> > Okay, I think I know what you mean. Those were back ticks!
>> >
>> > So I tried the following:
>> >
>> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>> >
>> > and I still get:
>> >
>> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>> >
>> > I did print `hbase classpath` on the console itself and it does print
>> > paths
>> > to various libs and jars.
>> >
>> > Regards,
>> > Shahab
>> >
>> >
>> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
>> > wrote:
>> >>
>> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> >> `hbase classpath` "? You mean declare a environment variable
>> >> 'HBASE_CLASSPATH'?
>> >>
>> >> Regards,
>> >> Shaahb
>> >>
>> >>
>> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>> >>>
>> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
>> >>>
>> >>> Instead of hard coding class path, can you try specifying `hbase
>> >>> classpath` ?
>> >>>
>> >>> Cheers
>> >>>
>> >>>
>> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hello,
>> >>>>
>> >>>> This might be something very obvious that I am missing but this has
>> >>>> been
>> >>>> bugging me and I am unable to find what am I missing?
>> >>>>
>> >>>> I have hadoop and hbase installed on Linux machine. Version
>> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
>> >>>> I can
>> >>>> invoke hbase shell and hadoop commands.
>> >>>>
>> >>>> When I give the following command:
>> >>>>
>> >>>> 'hbase version'
>> >>>>
>> >>>> I get the following output which is correct and expected:
>> >>>> -----------------------
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> >>>>
>> >>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> >>>> -r Unknown
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
>> >>>> Nov
>> >>>> 1 18:01:09 PDT 2012
>> >>>>
>> >>>> But when I I kick of the VersionInfo class manually (I do see that
>> >>>> there
>> >>>> is a main method in there), I get an Unknown result? Why is that?
>> >>>> Command:
>> >>>> 'java  -cp
>> >>>>
>> >>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> >>>> org.apache.hadoop.hbase.util.VersionInfo'
>> >>>>
>> >>>> Output:
>> >>>> -----------------------
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: HBase Unknown
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: Subversion Unknown -r Unknown
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: Compiled by Unknown on Unknown
>> >>>>
>> >>>> Now this is causing problems when I am trying to run my HBase client
>> >>>> on
>> >>>> this machine as the it aborts with the following error:
>> >>>> -----------------------
>> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for
>> >>>> and
>> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>> >>>>    at
>> >>>>
>> >>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>> >>>>
>> >>>> This means that the hbase-default.xml in the hbase jar is being
>> >>>> picked
>> >>>> up but the version info captured/compiled through annotations is not?
>> >>>> How is
>> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
>> >>>>
>> >>>> Please advise. Thanks a lot. I will be very grateful.
>> >>>>
>> >>>> Regards,
>> >>>> Shahab
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Harsh J <ha...@cloudera.com>.

Well… Bingo! :)

We don't write our projects for 1.5 JVMs, and especially not the GCJ
(1.5 didn't have annotations either IIRC? We depend on that here). Try
with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.

On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus <sh...@gmail.com> wrote:
> The output of "java -version" is:
>
> java -version
> java version "1.5.0"
> gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
>
> Copyright (C) 2007 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> --------------------------------------------------
>
> Also, when I run:
>
> "hbase org.apache.hadoop.hbase.util.VersionInfo"
>
> I do get the correct output:
> 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> This is strange and because of this I am unable to run my java client which
> errores out as mentioned with the following:
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>    at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> This is rather odd and am unable to reproduce this across several
>> versions. It may even be something to do with all that static loading
>> done in the VersionInfo class but am unsure at the moment.
>>
>> What does "java -version" print for you?
>>
>> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>> > Okay, I think I know what you mean. Those were back ticks!
>> >
>> > So I tried the following:
>> >
>> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>> >
>> > and I still get:
>> >
>> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>> >
>> > I did print `hbase classpath` on the console itself and it does print
>> > paths
>> > to various libs and jars.
>> >
>> > Regards,
>> > Shahab
>> >
>> >
>> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
>> > wrote:
>> >>
>> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> >> `hbase classpath` "? You mean declare a environment variable
>> >> 'HBASE_CLASSPATH'?
>> >>
>> >> Regards,
>> >> Shaahb
>> >>
>> >>
>> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>> >>>
>> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
>> >>>
>> >>> Instead of hard coding class path, can you try specifying `hbase
>> >>> classpath` ?
>> >>>
>> >>> Cheers
>> >>>
>> >>>
>> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hello,
>> >>>>
>> >>>> This might be something very obvious that I am missing but this has
>> >>>> been
>> >>>> bugging me and I am unable to find what am I missing?
>> >>>>
>> >>>> I have hadoop and hbase installed on Linux machine. Version
>> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
>> >>>> I can
>> >>>> invoke hbase shell and hadoop commands.
>> >>>>
>> >>>> When I give the following command:
>> >>>>
>> >>>> 'hbase version'
>> >>>>
>> >>>> I get the following output which is correct and expected:
>> >>>> -----------------------
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> >>>>
>> >>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> >>>> -r Unknown
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
>> >>>> Nov
>> >>>> 1 18:01:09 PDT 2012
>> >>>>
>> >>>> But when I I kick of the VersionInfo class manually (I do see that
>> >>>> there
>> >>>> is a main method in there), I get an Unknown result? Why is that?
>> >>>> Command:
>> >>>> 'java  -cp
>> >>>>
>> >>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> >>>> org.apache.hadoop.hbase.util.VersionInfo'
>> >>>>
>> >>>> Output:
>> >>>> -----------------------
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: HBase Unknown
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: Subversion Unknown -r Unknown
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: Compiled by Unknown on Unknown
>> >>>>
>> >>>> Now this is causing problems when I am trying to run my HBase client
>> >>>> on
>> >>>> this machine as the it aborts with the following error:
>> >>>> -----------------------
>> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for
>> >>>> and
>> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>> >>>>    at
>> >>>>
>> >>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>> >>>>
>> >>>> This means that the hbase-default.xml in the hbase jar is being
>> >>>> picked
>> >>>> up but the version info captured/compiled through annotations is not?
>> >>>> How is
>> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
>> >>>>
>> >>>> Please advise. Thanks a lot. I will be very grateful.
>> >>>>
>> >>>> Regards,
>> >>>> Shahab
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Harsh J <ha...@cloudera.com>.

Well… Bingo! :)

We don't write our projects for 1.5 JVMs, and especially not the GCJ
(1.5 didn't have annotations either IIRC? We depend on that here). Try
with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.

On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus <sh...@gmail.com> wrote:
> The output of "java -version" is:
>
> java -version
> java version "1.5.0"
> gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
>
> Copyright (C) 2007 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> --------------------------------------------------
>
> Also, when I run:
>
> "hbase org.apache.hadoop.hbase.util.VersionInfo"
>
> I do get the correct output:
> 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> This is strange and because of this I am unable to run my java client which
> errores out as mentioned with the following:
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>    at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> This is rather odd and am unable to reproduce this across several
>> versions. It may even be something to do with all that static loading
>> done in the VersionInfo class but am unsure at the moment.
>>
>> What does "java -version" print for you?
>>
>> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>> > Okay, I think I know what you mean. Those were back ticks!
>> >
>> > So I tried the following:
>> >
>> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>> >
>> > and I still get:
>> >
>> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>> >
>> > I did print `hbase classpath` on the console itself and it does print
>> > paths
>> > to various libs and jars.
>> >
>> > Regards,
>> > Shahab
>> >
>> >
>> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
>> > wrote:
>> >>
>> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> >> `hbase classpath` "? You mean declare a environment variable
>> >> 'HBASE_CLASSPATH'?
>> >>
>> >> Regards,
>> >> Shaahb
>> >>
>> >>
>> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>> >>>
>> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
>> >>>
>> >>> Instead of hard coding class path, can you try specifying `hbase
>> >>> classpath` ?
>> >>>
>> >>> Cheers
>> >>>
>> >>>
>> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hello,
>> >>>>
>> >>>> This might be something very obvious that I am missing but this has
>> >>>> been
>> >>>> bugging me and I am unable to find what am I missing?
>> >>>>
>> >>>> I have hadoop and hbase installed on Linux machine. Version
>> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
>> >>>> I can
>> >>>> invoke hbase shell and hadoop commands.
>> >>>>
>> >>>> When I give the following command:
>> >>>>
>> >>>> 'hbase version'
>> >>>>
>> >>>> I get the following output which is correct and expected:
>> >>>> -----------------------
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> >>>>
>> >>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> >>>> -r Unknown
>> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
>> >>>> Nov
>> >>>> 1 18:01:09 PDT 2012
>> >>>>
>> >>>> But when I I kick of the VersionInfo class manually (I do see that
>> >>>> there
>> >>>> is a main method in there), I get an Unknown result? Why is that?
>> >>>> Command:
>> >>>> 'java  -cp
>> >>>>
>> >>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> >>>> org.apache.hadoop.hbase.util.VersionInfo'
>> >>>>
>> >>>> Output:
>> >>>> -----------------------
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: HBase Unknown
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: Subversion Unknown -r Unknown
>> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> >>>> logVersion
>> >>>> INFO: Compiled by Unknown on Unknown
>> >>>>
>> >>>> Now this is causing problems when I am trying to run my HBase client
>> >>>> on
>> >>>> this machine as the it aborts with the following error:
>> >>>> -----------------------
>> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for
>> >>>> and
>> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>> >>>>    at
>> >>>>
>> >>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>> >>>>
>> >>>> This means that the hbase-default.xml in the hbase jar is being
>> >>>> picked
>> >>>> up but the version info captured/compiled through annotations is not?
>> >>>> How is
>> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
>> >>>>
>> >>>> Please advise. Thanks a lot. I will be very grateful.
>> >>>>
>> >>>> Regards,
>> >>>> Shahab
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

The output of "java -version" is:

java -version
java version "1.5.0"
gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)

Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
--------------------------------------------------

Also, when I run:

"hbase org.apache.hadoop.hbase.util.VersionInfo"

I do get the correct output:
3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
13/04/29 09:50:26 INFO util.VersionInfo: Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
-r Unknown
13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
18:01:09 PDT 2012

This is strange and because of this I am unable to run my java client which
errores out as mentioned with the following:
java.lang.RuntimeException: hbase-default.xml file seems to be for and old
version of HBase (0.92.1-cdh4.1.2), this version is Unknown
   at
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:

> This is rather odd and am unable to reproduce this across several
> versions. It may even be something to do with all that static loading
> done in the VersionInfo class but am unsure at the moment.
>
> What does "java -version" print for you?
>
> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
> > Okay, I think I know what you mean. Those were back ticks!
> >
> > So I tried the following:
> >
> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >
> > and I still get:
> >
> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
> >
> > I did print `hbase classpath` on the console itself and it does print
> paths
> > to various libs and jars.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >>
> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> >> `hbase classpath` "? You mean declare a environment variable
> >> 'HBASE_CLASSPATH'?
> >>
> >> Regards,
> >> Shaahb
> >>
> >>
> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>
> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >>>
> >>> Instead of hard coding class path, can you try specifying `hbase
> >>> classpath` ?
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> This might be something very obvious that I am missing but this has
> been
> >>>> bugging me and I am unable to find what am I missing?
> >>>>
> >>>> I have hadoop and hbase installed on Linux machine. Version
> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
> I can
> >>>> invoke hbase shell and hadoop commands.
> >>>>
> >>>> When I give the following command:
> >>>>
> >>>> 'hbase version'
> >>>>
> >>>> I get the following output which is correct and expected:
> >>>> -----------------------
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >>>>
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >>>> -r Unknown
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> Nov
> >>>> 1 18:01:09 PDT 2012
> >>>>
> >>>> But when I I kick of the VersionInfo class manually (I do see that
> there
> >>>> is a main method in there), I get an Unknown result? Why is that?
> >>>> Command:
> >>>> 'java  -cp
> >>>>
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> >>>> org.apache.hadoop.hbase.util.VersionInfo'
> >>>>
> >>>> Output:
> >>>> -----------------------
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: HBase Unknown
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: Subversion Unknown -r Unknown
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: Compiled by Unknown on Unknown
> >>>>
> >>>> Now this is causing problems when I am trying to run my HBase client
> on
> >>>> this machine as the it aborts with the following error:
> >>>> -----------------------
> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >>>>    at
> >>>>
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >>>>
> >>>> This means that the hbase-default.xml in the hbase jar is being picked
> >>>> up but the version info captured/compiled through annotations is not?
> How is
> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
> >>>>
> >>>> Please advise. Thanks a lot. I will be very grateful.
> >>>>
> >>>> Regards,
> >>>> Shahab
> >>>
> >>>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

The output of "java -version" is:

java -version
java version "1.5.0"
gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)

Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
--------------------------------------------------

Also, when I run:

"hbase org.apache.hadoop.hbase.util.VersionInfo"

I do get the correct output:
3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
13/04/29 09:50:26 INFO util.VersionInfo: Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
-r Unknown
13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
18:01:09 PDT 2012

This is strange and because of this I am unable to run my java client which
errores out as mentioned with the following:
java.lang.RuntimeException: hbase-default.xml file seems to be for and old
version of HBase (0.92.1-cdh4.1.2), this version is Unknown
   at
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:

> This is rather odd and am unable to reproduce this across several
> versions. It may even be something to do with all that static loading
> done in the VersionInfo class but am unsure at the moment.
>
> What does "java -version" print for you?
>
> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
> > Okay, I think I know what you mean. Those were back ticks!
> >
> > So I tried the following:
> >
> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >
> > and I still get:
> >
> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
> >
> > I did print `hbase classpath` on the console itself and it does print
> paths
> > to various libs and jars.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >>
> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> >> `hbase classpath` "? You mean declare a environment variable
> >> 'HBASE_CLASSPATH'?
> >>
> >> Regards,
> >> Shaahb
> >>
> >>
> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>
> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >>>
> >>> Instead of hard coding class path, can you try specifying `hbase
> >>> classpath` ?
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> This might be something very obvious that I am missing but this has
> been
> >>>> bugging me and I am unable to find what am I missing?
> >>>>
> >>>> I have hadoop and hbase installed on Linux machine. Version
> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
> I can
> >>>> invoke hbase shell and hadoop commands.
> >>>>
> >>>> When I give the following command:
> >>>>
> >>>> 'hbase version'
> >>>>
> >>>> I get the following output which is correct and expected:
> >>>> -----------------------
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >>>>
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >>>> -r Unknown
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> Nov
> >>>> 1 18:01:09 PDT 2012
> >>>>
> >>>> But when I I kick of the VersionInfo class manually (I do see that
> there
> >>>> is a main method in there), I get an Unknown result? Why is that?
> >>>> Command:
> >>>> 'java  -cp
> >>>>
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> >>>> org.apache.hadoop.hbase.util.VersionInfo'
> >>>>
> >>>> Output:
> >>>> -----------------------
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: HBase Unknown
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: Subversion Unknown -r Unknown
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: Compiled by Unknown on Unknown
> >>>>
> >>>> Now this is causing problems when I am trying to run my HBase client
> on
> >>>> this machine as the it aborts with the following error:
> >>>> -----------------------
> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >>>>    at
> >>>>
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >>>>
> >>>> This means that the hbase-default.xml in the hbase jar is being picked
> >>>> up but the version info captured/compiled through annotations is not?
> How is
> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
> >>>>
> >>>> Please advise. Thanks a lot. I will be very grateful.
> >>>>
> >>>> Regards,
> >>>> Shahab
> >>>
> >>>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

The output of "java -version" is:

java -version
java version "1.5.0"
gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)

Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
--------------------------------------------------

Also, when I run:

"hbase org.apache.hadoop.hbase.util.VersionInfo"

I do get the correct output:
3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
13/04/29 09:50:26 INFO util.VersionInfo: Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
-r Unknown
13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
18:01:09 PDT 2012

This is strange and because of this I am unable to run my java client which
errores out as mentioned with the following:
java.lang.RuntimeException: hbase-default.xml file seems to be for and old
version of HBase (0.92.1-cdh4.1.2), this version is Unknown
   at
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:

> This is rather odd and am unable to reproduce this across several
> versions. It may even be something to do with all that static loading
> done in the VersionInfo class but am unsure at the moment.
>
> What does "java -version" print for you?
>
> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
> > Okay, I think I know what you mean. Those were back ticks!
> >
> > So I tried the following:
> >
> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >
> > and I still get:
> >
> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
> >
> > I did print `hbase classpath` on the console itself and it does print
> paths
> > to various libs and jars.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >>
> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> >> `hbase classpath` "? You mean declare a environment variable
> >> 'HBASE_CLASSPATH'?
> >>
> >> Regards,
> >> Shaahb
> >>
> >>
> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>
> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >>>
> >>> Instead of hard coding class path, can you try specifying `hbase
> >>> classpath` ?
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> This might be something very obvious that I am missing but this has
> been
> >>>> bugging me and I am unable to find what am I missing?
> >>>>
> >>>> I have hadoop and hbase installed on Linux machine. Version
> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
> I can
> >>>> invoke hbase shell and hadoop commands.
> >>>>
> >>>> When I give the following command:
> >>>>
> >>>> 'hbase version'
> >>>>
> >>>> I get the following output which is correct and expected:
> >>>> -----------------------
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >>>>
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >>>> -r Unknown
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> Nov
> >>>> 1 18:01:09 PDT 2012
> >>>>
> >>>> But when I I kick of the VersionInfo class manually (I do see that
> there
> >>>> is a main method in there), I get an Unknown result? Why is that?
> >>>> Command:
> >>>> 'java  -cp
> >>>>
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> >>>> org.apache.hadoop.hbase.util.VersionInfo'
> >>>>
> >>>> Output:
> >>>> -----------------------
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: HBase Unknown
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: Subversion Unknown -r Unknown
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: Compiled by Unknown on Unknown
> >>>>
> >>>> Now this is causing problems when I am trying to run my HBase client
> on
> >>>> this machine as the it aborts with the following error:
> >>>> -----------------------
> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >>>>    at
> >>>>
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >>>>
> >>>> This means that the hbase-default.xml in the hbase jar is being picked
> >>>> up but the version info captured/compiled through annotations is not?
> How is
> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
> >>>>
> >>>> Please advise. Thanks a lot. I will be very grateful.
> >>>>
> >>>> Regards,
> >>>> Shahab
> >>>
> >>>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

The output of "java -version" is:

java -version
java version "1.5.0"
gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)

Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
--------------------------------------------------

Also, when I run:

"hbase org.apache.hadoop.hbase.util.VersionInfo"

I do get the correct output:
3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
13/04/29 09:50:26 INFO util.VersionInfo: Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
-r Unknown
13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
18:01:09 PDT 2012

This is strange and because of this I am unable to run my java client which
errores out as mentioned with the following:
java.lang.RuntimeException: hbase-default.xml file seems to be for and old
version of HBase (0.92.1-cdh4.1.2), this version is Unknown
   at
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:50 AM, Harsh J <ha...@cloudera.com> wrote:

> This is rather odd and am unable to reproduce this across several
> versions. It may even be something to do with all that static loading
> done in the VersionInfo class but am unsure at the moment.
>
> What does "java -version" print for you?
>
> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
> > Okay, I think I know what you mean. Those were back ticks!
> >
> > So I tried the following:
> >
> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >
> > and I still get:
> >
> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
> >
> > I did print `hbase classpath` on the console itself and it does print
> paths
> > to various libs and jars.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >>
> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> >> `hbase classpath` "? You mean declare a environment variable
> >> 'HBASE_CLASSPATH'?
> >>
> >> Regards,
> >> Shaahb
> >>
> >>
> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>
> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >>>
> >>> Instead of hard coding class path, can you try specifying `hbase
> >>> classpath` ?
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> This might be something very obvious that I am missing but this has
> been
> >>>> bugging me and I am unable to find what am I missing?
> >>>>
> >>>> I have hadoop and hbase installed on Linux machine. Version
> >>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
> I can
> >>>> invoke hbase shell and hadoop commands.
> >>>>
> >>>> When I give the following command:
> >>>>
> >>>> 'hbase version'
> >>>>
> >>>> I get the following output which is correct and expected:
> >>>> -----------------------
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >>>>
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >>>> -r Unknown
> >>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> Nov
> >>>> 1 18:01:09 PDT 2012
> >>>>
> >>>> But when I I kick of the VersionInfo class manually (I do see that
> there
> >>>> is a main method in there), I get an Unknown result? Why is that?
> >>>> Command:
> >>>> 'java  -cp
> >>>>
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> >>>> org.apache.hadoop.hbase.util.VersionInfo'
> >>>>
> >>>> Output:
> >>>> -----------------------
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: HBase Unknown
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: Subversion Unknown -r Unknown
> >>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> >>>> logVersion
> >>>> INFO: Compiled by Unknown on Unknown
> >>>>
> >>>> Now this is causing problems when I am trying to run my HBase client
> on
> >>>> this machine as the it aborts with the following error:
> >>>> -----------------------
> >>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
> >>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >>>>    at
> >>>>
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >>>>
> >>>> This means that the hbase-default.xml in the hbase jar is being picked
> >>>> up but the version info captured/compiled through annotations is not?
> How is
> >>>> it possible if 'hbase shell' (or hadoop version') works fine!
> >>>>
> >>>> Please advise. Thanks a lot. I will be very grateful.
> >>>>
> >>>> Regards,
> >>>> Shahab
> >>>
> >>>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Harsh J <ha...@cloudera.com>.

This is rather odd and am unable to reproduce this across several
versions. It may even be something to do with all that static loading
done in the VersionInfo class but am unsure at the moment.

What does "java -version" print for you?

On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com> wrote:
> Okay, I think I know what you mean. Those were back ticks!
>
> So I tried the following:
>
> java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>
> and I still get:
>
> 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>
> I did print `hbase classpath` on the console itself and it does print paths
> to various libs and jars.
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>>
>> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> `hbase classpath` "? You mean declare a environment variable
>> 'HBASE_CLASSPATH'?
>>
>> Regards,
>> Shaahb
>>
>>
>> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>>
>>> Instead of hard coding class path, can you try specifying `hbase
>>> classpath` ?
>>>
>>> Cheers
>>>
>>>
>>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> This might be something very obvious that I am missing but this has been
>>>> bugging me and I am unable to find what am I missing?
>>>>
>>>> I have hadoop and hbase installed on Linux machine. Version
>>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and I can
>>>> invoke hbase shell and hadoop commands.
>>>>
>>>> When I give the following command:
>>>>
>>>> 'hbase version'
>>>>
>>>> I get the following output which is correct and expected:
>>>> -----------------------
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>>> -r Unknown
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>> 1 18:01:09 PDT 2012
>>>>
>>>> But when I I kick of the VersionInfo class manually (I do see that there
>>>> is a main method in there), I get an Unknown result? Why is that?
>>>> Command:
>>>> 'java  -cp
>>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>>
>>>> Output:
>>>> -----------------------
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: HBase Unknown
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: Subversion Unknown -r Unknown
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: Compiled by Unknown on Unknown
>>>>
>>>> Now this is causing problems when I am trying to run my HBase client on
>>>> this machine as the it aborts with the following error:
>>>> -----------------------
>>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>>    at
>>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>>
>>>> This means that the hbase-default.xml in the hbase jar is being picked
>>>> up but the version info captured/compiled through annotations is not? How is
>>>> it possible if 'hbase shell' (or hadoop version') works fine!
>>>>
>>>> Please advise. Thanks a lot. I will be very grateful.
>>>>
>>>> Regards,
>>>> Shahab
>>>
>>>
>>
>



-- 
Harsh J

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Harsh J <ha...@cloudera.com>.

This is rather odd and am unable to reproduce this across several
versions. It may even be something to do with all that static loading
done in the VersionInfo class but am unsure at the moment.

What does "java -version" print for you?

On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com> wrote:
> Okay, I think I know what you mean. Those were back ticks!
>
> So I tried the following:
>
> java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>
> and I still get:
>
> 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>
> I did print `hbase classpath` on the console itself and it does print paths
> to various libs and jars.
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>>
>> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> `hbase classpath` "? You mean declare a environment variable
>> 'HBASE_CLASSPATH'?
>>
>> Regards,
>> Shaahb
>>
>>
>> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>>
>>> Instead of hard coding class path, can you try specifying `hbase
>>> classpath` ?
>>>
>>> Cheers
>>>
>>>
>>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> This might be something very obvious that I am missing but this has been
>>>> bugging me and I am unable to find what am I missing?
>>>>
>>>> I have hadoop and hbase installed on Linux machine. Version
>>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and I can
>>>> invoke hbase shell and hadoop commands.
>>>>
>>>> When I give the following command:
>>>>
>>>> 'hbase version'
>>>>
>>>> I get the following output which is correct and expected:
>>>> -----------------------
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>>> -r Unknown
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>> 1 18:01:09 PDT 2012
>>>>
>>>> But when I I kick of the VersionInfo class manually (I do see that there
>>>> is a main method in there), I get an Unknown result? Why is that?
>>>> Command:
>>>> 'java  -cp
>>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>>
>>>> Output:
>>>> -----------------------
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: HBase Unknown
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: Subversion Unknown -r Unknown
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: Compiled by Unknown on Unknown
>>>>
>>>> Now this is causing problems when I am trying to run my HBase client on
>>>> this machine as the it aborts with the following error:
>>>> -----------------------
>>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>>    at
>>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>>
>>>> This means that the hbase-default.xml in the hbase jar is being picked
>>>> up but the version info captured/compiled through annotations is not? How is
>>>> it possible if 'hbase shell' (or hadoop version') works fine!
>>>>
>>>> Please advise. Thanks a lot. I will be very grateful.
>>>>
>>>> Regards,
>>>> Shahab
>>>
>>>
>>
>



-- 
Harsh J

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Harsh J <ha...@cloudera.com>.

This is rather odd and am unable to reproduce this across several
versions. It may even be something to do with all that static loading
done in the VersionInfo class but am unsure at the moment.

What does "java -version" print for you?

On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com> wrote:
> Okay, I think I know what you mean. Those were back ticks!
>
> So I tried the following:
>
> java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>
> and I still get:
>
> 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>
> I did print `hbase classpath` on the console itself and it does print paths
> to various libs and jars.
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>>
>> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> `hbase classpath` "? You mean declare a environment variable
>> 'HBASE_CLASSPATH'?
>>
>> Regards,
>> Shaahb
>>
>>
>> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>>
>>> Instead of hard coding class path, can you try specifying `hbase
>>> classpath` ?
>>>
>>> Cheers
>>>
>>>
>>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> This might be something very obvious that I am missing but this has been
>>>> bugging me and I am unable to find what am I missing?
>>>>
>>>> I have hadoop and hbase installed on Linux machine. Version
>>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and I can
>>>> invoke hbase shell and hadoop commands.
>>>>
>>>> When I give the following command:
>>>>
>>>> 'hbase version'
>>>>
>>>> I get the following output which is correct and expected:
>>>> -----------------------
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>>> -r Unknown
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>> 1 18:01:09 PDT 2012
>>>>
>>>> But when I I kick of the VersionInfo class manually (I do see that there
>>>> is a main method in there), I get an Unknown result? Why is that?
>>>> Command:
>>>> 'java  -cp
>>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>>
>>>> Output:
>>>> -----------------------
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: HBase Unknown
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: Subversion Unknown -r Unknown
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: Compiled by Unknown on Unknown
>>>>
>>>> Now this is causing problems when I am trying to run my HBase client on
>>>> this machine as the it aborts with the following error:
>>>> -----------------------
>>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>>    at
>>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>>
>>>> This means that the hbase-default.xml in the hbase jar is being picked
>>>> up but the version info captured/compiled through annotations is not? How is
>>>> it possible if 'hbase shell' (or hadoop version') works fine!
>>>>
>>>> Please advise. Thanks a lot. I will be very grateful.
>>>>
>>>> Regards,
>>>> Shahab
>>>
>>>
>>
>



-- 
Harsh J

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Harsh J <ha...@cloudera.com>.

This is rather odd and am unable to reproduce this across several
versions. It may even be something to do with all that static loading
done in the VersionInfo class but am unsure at the moment.

What does "java -version" print for you?

On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus <sh...@gmail.com> wrote:
> Okay, I think I know what you mean. Those were back ticks!
>
> So I tried the following:
>
> java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>
> and I still get:
>
> 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>
> I did print `hbase classpath` on the console itself and it does print paths
> to various libs and jars.
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>>
>> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> `hbase classpath` "? You mean declare a environment variable
>> 'HBASE_CLASSPATH'?
>>
>> Regards,
>> Shaahb
>>
>>
>> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>>
>>> Instead of hard coding class path, can you try specifying `hbase
>>> classpath` ?
>>>
>>> Cheers
>>>
>>>
>>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> This might be something very obvious that I am missing but this has been
>>>> bugging me and I am unable to find what am I missing?
>>>>
>>>> I have hadoop and hbase installed on Linux machine. Version
>>>> 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and I can
>>>> invoke hbase shell and hadoop commands.
>>>>
>>>> When I give the following command:
>>>>
>>>> 'hbase version'
>>>>
>>>> I get the following output which is correct and expected:
>>>> -----------------------
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>>> -r Unknown
>>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>> 1 18:01:09 PDT 2012
>>>>
>>>> But when I I kick of the VersionInfo class manually (I do see that there
>>>> is a main method in there), I get an Unknown result? Why is that?
>>>> Command:
>>>> 'java  -cp
>>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>>
>>>> Output:
>>>> -----------------------
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: HBase Unknown
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: Subversion Unknown -r Unknown
>>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>>> logVersion
>>>> INFO: Compiled by Unknown on Unknown
>>>>
>>>> Now this is causing problems when I am trying to run my HBase client on
>>>> this machine as the it aborts with the following error:
>>>> -----------------------
>>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>>    at
>>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>>
>>>> This means that the hbase-default.xml in the hbase jar is being picked
>>>> up but the version info captured/compiled through annotations is not? How is
>>>> it possible if 'hbase shell' (or hadoop version') works fine!
>>>>
>>>> Please advise. Thanks a lot. I will be very grateful.
>>>>
>>>> Regards,
>>>> Shahab
>>>
>>>
>>
>



-- 
Harsh J

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Okay, I think I know what you mean. Those were back ticks!

So I tried the following:

java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo

and I still get:

13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown

I did print `hbase classpath` on the console itself and it does print paths
to various libs and jars.

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>wrote:

> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> `hbase classpath` "? You mean declare a environment variable
> 'HBASE_CLASSPATH'?
>
> Regards,
> Shaahb
>
>
> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>
>> Instead of hard coding class path, can you try specifying `hbase
>> classpath` ?
>>
>> Cheers
>>
>>
>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> This might be something very obvious that I am missing but this has been
>>> bugging me and I am unable to find what am I missing?
>>>
>>> I have hadoop and hbase installed on Linux machine.
>>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>>> and I can invoke hbase shell and hadoop commands.
>>>
>>> When I give the following command:
>>>
>>> 'hbase version'
>>>
>>> I get the following output which is correct and expected:
>>> -----------------------
>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>> -r Unknown
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>  1 18:01:09 PDT 2012
>>>
>>> But when I I kick of the VersionInfo class manually (I do see that there
>>> is a main method in there), I get an Unknown result? Why is that?
>>> Command:
>>> 'java  -cp
>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>
>>> Output:
>>> -----------------------
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: HBase Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Subversion Unknown -r Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Compiled by Unknown on Unknown
>>>
>>> Now this is causing problems when I am trying to run my HBase client on
>>> this machine as the it aborts with the following error:
>>> -----------------------
>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>    at
>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>
>>> This means that the hbase-default.xml in the hbase jar is being picked
>>> up but the version info captured/compiled through annotations is not? How
>>> is it possible if 'hbase shell' (or hadoop version') works fine!
>>>
>>> Please advise. Thanks a lot. I will be very grateful.
>>>
>>> Regards,
>>> Shahab
>>>
>>
>>
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Okay, I think I know what you mean. Those were back ticks!

So I tried the following:

java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo

and I still get:

13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown

I did print `hbase classpath` on the console itself and it does print paths
to various libs and jars.

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>wrote:

> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> `hbase classpath` "? You mean declare a environment variable
> 'HBASE_CLASSPATH'?
>
> Regards,
> Shaahb
>
>
> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>
>> Instead of hard coding class path, can you try specifying `hbase
>> classpath` ?
>>
>> Cheers
>>
>>
>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> This might be something very obvious that I am missing but this has been
>>> bugging me and I am unable to find what am I missing?
>>>
>>> I have hadoop and hbase installed on Linux machine.
>>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>>> and I can invoke hbase shell and hadoop commands.
>>>
>>> When I give the following command:
>>>
>>> 'hbase version'
>>>
>>> I get the following output which is correct and expected:
>>> -----------------------
>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>> -r Unknown
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>  1 18:01:09 PDT 2012
>>>
>>> But when I I kick of the VersionInfo class manually (I do see that there
>>> is a main method in there), I get an Unknown result? Why is that?
>>> Command:
>>> 'java  -cp
>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>
>>> Output:
>>> -----------------------
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: HBase Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Subversion Unknown -r Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Compiled by Unknown on Unknown
>>>
>>> Now this is causing problems when I am trying to run my HBase client on
>>> this machine as the it aborts with the following error:
>>> -----------------------
>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>    at
>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>
>>> This means that the hbase-default.xml in the hbase jar is being picked
>>> up but the version info captured/compiled through annotations is not? How
>>> is it possible if 'hbase shell' (or hadoop version') works fine!
>>>
>>> Please advise. Thanks a lot. I will be very grateful.
>>>
>>> Regards,
>>> Shahab
>>>
>>
>>
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Okay, I think I know what you mean. Those were back ticks!

So I tried the following:

java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo

and I still get:

13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown

I did print `hbase classpath` on the console itself and it does print paths
to various libs and jars.

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>wrote:

> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> `hbase classpath` "? You mean declare a environment variable
> 'HBASE_CLASSPATH'?
>
> Regards,
> Shaahb
>
>
> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>
>> Instead of hard coding class path, can you try specifying `hbase
>> classpath` ?
>>
>> Cheers
>>
>>
>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> This might be something very obvious that I am missing but this has been
>>> bugging me and I am unable to find what am I missing?
>>>
>>> I have hadoop and hbase installed on Linux machine.
>>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>>> and I can invoke hbase shell and hadoop commands.
>>>
>>> When I give the following command:
>>>
>>> 'hbase version'
>>>
>>> I get the following output which is correct and expected:
>>> -----------------------
>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>> -r Unknown
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>  1 18:01:09 PDT 2012
>>>
>>> But when I I kick of the VersionInfo class manually (I do see that there
>>> is a main method in there), I get an Unknown result? Why is that?
>>> Command:
>>> 'java  -cp
>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>
>>> Output:
>>> -----------------------
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: HBase Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Subversion Unknown -r Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Compiled by Unknown on Unknown
>>>
>>> Now this is causing problems when I am trying to run my HBase client on
>>> this machine as the it aborts with the following error:
>>> -----------------------
>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>    at
>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>
>>> This means that the hbase-default.xml in the hbase jar is being picked
>>> up but the version info captured/compiled through annotations is not? How
>>> is it possible if 'hbase shell' (or hadoop version') works fine!
>>>
>>> Please advise. Thanks a lot. I will be very grateful.
>>>
>>> Regards,
>>> Shahab
>>>
>>
>>
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Okay, I think I know what you mean. Those were back ticks!

So I tried the following:

java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo

and I still get:

13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown

I did print `hbase classpath` on the console itself and it does print paths
to various libs and jars.

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <sh...@gmail.com>wrote:

> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> `hbase classpath` "? You mean declare a environment variable
> 'HBASE_CLASSPATH'?
>
> Regards,
> Shaahb
>
>
> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>
>> Instead of hard coding class path, can you try specifying `hbase
>> classpath` ?
>>
>> Cheers
>>
>>
>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> This might be something very obvious that I am missing but this has been
>>> bugging me and I am unable to find what am I missing?
>>>
>>> I have hadoop and hbase installed on Linux machine.
>>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>>> and I can invoke hbase shell and hadoop commands.
>>>
>>> When I give the following command:
>>>
>>> 'hbase version'
>>>
>>> I get the following output which is correct and expected:
>>> -----------------------
>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>> -r Unknown
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>  1 18:01:09 PDT 2012
>>>
>>> But when I I kick of the VersionInfo class manually (I do see that there
>>> is a main method in there), I get an Unknown result? Why is that?
>>> Command:
>>> 'java  -cp
>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>
>>> Output:
>>> -----------------------
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: HBase Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Subversion Unknown -r Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Compiled by Unknown on Unknown
>>>
>>> Now this is causing problems when I am trying to run my HBase client on
>>> this machine as the it aborts with the following error:
>>> -----------------------
>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>    at
>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>
>>> This means that the hbase-default.xml in the hbase jar is being picked
>>> up but the version info captured/compiled through annotations is not? How
>>> is it possible if 'hbase shell' (or hadoop version') works fine!
>>>
>>> Please advise. Thanks a lot. I will be very grateful.
>>>
>>> Regards,
>>> Shahab
>>>
>>
>>
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Ted, Sorry I didn't understand. What do you mean exactly by "specifying
`hbase classpath` "? You mean declare a environment variable
'HBASE_CLASSPATH'?

Regards,
Shaahb


On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:

> bq. 'java  -cp /usr/lib/hbase/hbase...
>
> Instead of hard coding class path, can you try specifying `hbase
> classpath` ?
>
> Cheers
>
>
> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> Hello,
>>
>> This might be something very obvious that I am missing but this has been
>> bugging me and I am unable to find what am I missing?
>>
>> I have hadoop and hbase installed on Linux machine.
>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>> and I can invoke hbase shell and hadoop commands.
>>
>> When I give the following command:
>>
>> 'hbase version'
>>
>> I get the following output which is correct and expected:
>> -----------------------
>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> -r Unknown
>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>  1 18:01:09 PDT 2012
>>
>> But when I I kick of the VersionInfo class manually (I do see that there
>> is a main method in there), I get an Unknown result? Why is that?
>> Command:
>> 'java  -cp
>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> org.apache.hadoop.hbase.util.VersionInfo'
>>
>> Output:
>> -----------------------
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: HBase Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Subversion Unknown -r Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Compiled by Unknown on Unknown
>>
>> Now this is causing problems when I am trying to run my HBase client on
>> this machine as the it aborts with the following error:
>> -----------------------
>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>    at
>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>
>> This means that the hbase-default.xml in the hbase jar is being picked up
>> but the version info captured/compiled through annotations is not? How is
>> it possible if 'hbase shell' (or hadoop version') works fine!
>>
>> Please advise. Thanks a lot. I will be very grateful.
>>
>> Regards,
>> Shahab
>>
>
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Ted, Sorry I didn't understand. What do you mean exactly by "specifying
`hbase classpath` "? You mean declare a environment variable
'HBASE_CLASSPATH'?

Regards,
Shaahb


On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:

> bq. 'java  -cp /usr/lib/hbase/hbase...
>
> Instead of hard coding class path, can you try specifying `hbase
> classpath` ?
>
> Cheers
>
>
> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> Hello,
>>
>> This might be something very obvious that I am missing but this has been
>> bugging me and I am unable to find what am I missing?
>>
>> I have hadoop and hbase installed on Linux machine.
>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>> and I can invoke hbase shell and hadoop commands.
>>
>> When I give the following command:
>>
>> 'hbase version'
>>
>> I get the following output which is correct and expected:
>> -----------------------
>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> -r Unknown
>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>  1 18:01:09 PDT 2012
>>
>> But when I I kick of the VersionInfo class manually (I do see that there
>> is a main method in there), I get an Unknown result? Why is that?
>> Command:
>> 'java  -cp
>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> org.apache.hadoop.hbase.util.VersionInfo'
>>
>> Output:
>> -----------------------
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: HBase Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Subversion Unknown -r Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Compiled by Unknown on Unknown
>>
>> Now this is causing problems when I am trying to run my HBase client on
>> this machine as the it aborts with the following error:
>> -----------------------
>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>    at
>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>
>> This means that the hbase-default.xml in the hbase jar is being picked up
>> but the version info captured/compiled through annotations is not? How is
>> it possible if 'hbase shell' (or hadoop version') works fine!
>>
>> Please advise. Thanks a lot. I will be very grateful.
>>
>> Regards,
>> Shahab
>>
>
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Ted, Sorry I didn't understand. What do you mean exactly by "specifying
`hbase classpath` "? You mean declare a environment variable
'HBASE_CLASSPATH'?

Regards,
Shaahb


On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:

> bq. 'java  -cp /usr/lib/hbase/hbase...
>
> Instead of hard coding class path, can you try specifying `hbase
> classpath` ?
>
> Cheers
>
>
> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> Hello,
>>
>> This might be something very obvious that I am missing but this has been
>> bugging me and I am unable to find what am I missing?
>>
>> I have hadoop and hbase installed on Linux machine.
>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>> and I can invoke hbase shell and hadoop commands.
>>
>> When I give the following command:
>>
>> 'hbase version'
>>
>> I get the following output which is correct and expected:
>> -----------------------
>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> -r Unknown
>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>  1 18:01:09 PDT 2012
>>
>> But when I I kick of the VersionInfo class manually (I do see that there
>> is a main method in there), I get an Unknown result? Why is that?
>> Command:
>> 'java  -cp
>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> org.apache.hadoop.hbase.util.VersionInfo'
>>
>> Output:
>> -----------------------
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: HBase Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Subversion Unknown -r Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Compiled by Unknown on Unknown
>>
>> Now this is causing problems when I am trying to run my HBase client on
>> this machine as the it aborts with the following error:
>> -----------------------
>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>    at
>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>
>> This means that the hbase-default.xml in the hbase jar is being picked up
>> but the version info captured/compiled through annotations is not? How is
>> it possible if 'hbase shell' (or hadoop version') works fine!
>>
>> Please advise. Thanks a lot. I will be very grateful.
>>
>> Regards,
>> Shahab
>>
>
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Shahab Yunus <sh...@gmail.com>.

Ted, Sorry I didn't understand. What do you mean exactly by "specifying
`hbase classpath` "? You mean declare a environment variable
'HBASE_CLASSPATH'?

Regards,
Shaahb


On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu <yu...@gmail.com> wrote:

> bq. 'java  -cp /usr/lib/hbase/hbase...
>
> Instead of hard coding class path, can you try specifying `hbase
> classpath` ?
>
> Cheers
>
>
> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> Hello,
>>
>> This might be something very obvious that I am missing but this has been
>> bugging me and I am unable to find what am I missing?
>>
>> I have hadoop and hbase installed on Linux machine.
>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>> and I can invoke hbase shell and hadoop commands.
>>
>> When I give the following command:
>>
>> 'hbase version'
>>
>> I get the following output which is correct and expected:
>> -----------------------
>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> -r Unknown
>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>  1 18:01:09 PDT 2012
>>
>> But when I I kick of the VersionInfo class manually (I do see that there
>> is a main method in there), I get an Unknown result? Why is that?
>> Command:
>> 'java  -cp
>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> org.apache.hadoop.hbase.util.VersionInfo'
>>
>> Output:
>> -----------------------
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: HBase Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Subversion Unknown -r Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Compiled by Unknown on Unknown
>>
>> Now this is causing problems when I am trying to run my HBase client on
>> this machine as the it aborts with the following error:
>> -----------------------
>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>    at
>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>
>> This means that the hbase-default.xml in the hbase jar is being picked up
>> but the version info captured/compiled through annotations is not? How is
>> it possible if 'hbase shell' (or hadoop version') works fine!
>>
>> Please advise. Thanks a lot. I will be very grateful.
>>
>> Regards,
>> Shahab
>>
>
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Ted Yu <yu...@gmail.com>.

bq. 'java  -cp /usr/lib/hbase/hbase...

Instead of hard coding class path, can you try specifying `hbase classpath`
?

Cheers

On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:

> Hello,
>
> This might be something very obvious that I am missing but this has been
> bugging me and I am unable to find what am I missing?
>
> I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2
> and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase
> shell and hadoop commands.
>
> When I give the following command:
>
> 'hbase version'
>
> I get the following output which is correct and expected:
> -----------------------
> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> But when I I kick of the VersionInfo class manually (I do see that there
> is a main method in there), I get an Unknown result? Why is that?
> Command:
> 'java  -cp
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> org.apache.hadoop.hbase.util.VersionInfo'
>
> Output:
> -----------------------
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: HBase Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Subversion Unknown -r Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Compiled by Unknown on Unknown
>
> Now this is causing problems when I am trying to run my HBase client on
> this machine as the it aborts with the following error:
> -----------------------
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>    at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> This means that the hbase-default.xml in the hbase jar is being picked up
> but the version info captured/compiled through annotations is not? How is
> it possible if 'hbase shell' (or hadoop version') works fine!
>
> Please advise. Thanks a lot. I will be very grateful.
>
> Regards,
> Shahab
>

Re: Hardware Selection for Hadoop

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

Regards, Raj. To know that data that you want to process with Hadoop is
critical for this, at least an approximation of the data. I think that
Hadoop Operations is an invaluable resource for this:

- Hadoop use heavily RAM, so, the first resource that you have to consider
is to use all available RAM that you could give to the nodes, with a marked
focus on the NameNode/JobTracker Node.

- For the DataNode/TaskTracker nodes, is very good to have fast disks, like
SSDs but they are expensive, so you can consider this too. For me WD
Barracuda are awesome.

- A good network connection between the nodes. Hadoop is a RCP-based
platform, so a good network is critical for a healthy cluster

A good start for me is for a small cluster:

- NN/JT: 8 to 16 GB RAM
- DN/TT: 4 to 8 GB RAM

Consider to use always compression, to optimize the communication between
all services in your Hadoop cluster (Snappy is my favorite)

All these advices are in the Hadoop Operations book from Eric, so, it´s
must-read for every Hadoop System Engineer.



2013/4/29 Raj Hadoop <ha...@yahoo.com>

>    Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Hardware Selection for Hadoop

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

Regards, Raj. To know that data that you want to process with Hadoop is
critical for this, at least an approximation of the data. I think that
Hadoop Operations is an invaluable resource for this:

- Hadoop use heavily RAM, so, the first resource that you have to consider
is to use all available RAM that you could give to the nodes, with a marked
focus on the NameNode/JobTracker Node.

- For the DataNode/TaskTracker nodes, is very good to have fast disks, like
SSDs but they are expensive, so you can consider this too. For me WD
Barracuda are awesome.

- A good network connection between the nodes. Hadoop is a RCP-based
platform, so a good network is critical for a healthy cluster

A good start for me is for a small cluster:

- NN/JT: 8 to 16 GB RAM
- DN/TT: 4 to 8 GB RAM

Consider to use always compression, to optimize the communication between
all services in your Hadoop cluster (Snappy is my favorite)

All these advices are in the Hadoop Operations book from Eric, so, it´s
must-read for every Hadoop System Engineer.



2013/4/29 Raj Hadoop <ha...@yahoo.com>

>    Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Hardware Selection for Hadoop

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

Regards, Raj. To know that data that you want to process with Hadoop is
critical for this, at least an approximation of the data. I think that
Hadoop Operations is an invaluable resource for this:

- Hadoop use heavily RAM, so, the first resource that you have to consider
is to use all available RAM that you could give to the nodes, with a marked
focus on the NameNode/JobTracker Node.

- For the DataNode/TaskTracker nodes, is very good to have fast disks, like
SSDs but they are expensive, so you can consider this too. For me WD
Barracuda are awesome.

- A good network connection between the nodes. Hadoop is a RCP-based
platform, so a good network is critical for a healthy cluster

A good start for me is for a small cluster:

- NN/JT: 8 to 16 GB RAM
- DN/TT: 4 to 8 GB RAM

Consider to use always compression, to optimize the communication between
all services in your Hadoop cluster (Snappy is my favorite)

All these advices are in the Hadoop Operations book from Eric, so, it´s
must-read for every Hadoop System Engineer.



2013/4/29 Raj Hadoop <ha...@yahoo.com>

>    Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Hardware Selection for Hadoop

Posted by Mohammad Tariq <do...@gmail.com>.

If I were to start with a 5 node cluster, I would do this :

*Machine 1 : *NN+JT
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD along with a NAS(To make sure
metadata is safe)

*Machine 2 : *SNN*
*
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD

*Machine 3,4,5 : *DN+TT
16GB RAM, 2xQuad Core Proc, 5 x 200GB SATA HDD(JBOD configuation)

I don't think you'll require 64GB RAM and so much of storage just for a
POC(but, it actually depends). You can really kick the ass with 32GB.

NIC(network interface card) is a computer hardware component that connects
a computer to a computer network. It must be reliable to make sure that all
your machines are always connected to the cluster and there is no problem
in the data transfer.

Apart from this ask them to provide you cabinets with good ventilation and
cooling mechanism.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 30, 2013 at 2:17 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Hi,
>
> In 5 node cluster - you mean
>
> Name Node , Job Tracker , Secondary Name Node all on 1
>         64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> Data Trackers and Job Trackers - on 4 machies - each of
>         32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> NIC ?
>
> Also - what other details should I provide to my hardware engineer.
>
> The idea is to start with a Web Log Processing proof of concept.
>
> Please advise.
>
>
>   *From:* Patai Sangbutsarakum <Pa...@turn.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Monday, April 29, 2013 2:49 PM
> *Subject:* Re: Hardware Selection for Hadoop
>
>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
> my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>
>
>

Re: Hardware Selection for Hadoop

Posted by Mohammad Tariq <do...@gmail.com>.

If I were to start with a 5 node cluster, I would do this :

*Machine 1 : *NN+JT
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD along with a NAS(To make sure
metadata is safe)

*Machine 2 : *SNN*
*
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD

*Machine 3,4,5 : *DN+TT
16GB RAM, 2xQuad Core Proc, 5 x 200GB SATA HDD(JBOD configuation)

I don't think you'll require 64GB RAM and so much of storage just for a
POC(but, it actually depends). You can really kick the ass with 32GB.

NIC(network interface card) is a computer hardware component that connects
a computer to a computer network. It must be reliable to make sure that all
your machines are always connected to the cluster and there is no problem
in the data transfer.

Apart from this ask them to provide you cabinets with good ventilation and
cooling mechanism.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 30, 2013 at 2:17 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Hi,
>
> In 5 node cluster - you mean
>
> Name Node , Job Tracker , Secondary Name Node all on 1
>         64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> Data Trackers and Job Trackers - on 4 machies - each of
>         32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> NIC ?
>
> Also - what other details should I provide to my hardware engineer.
>
> The idea is to start with a Web Log Processing proof of concept.
>
> Please advise.
>
>
>   *From:* Patai Sangbutsarakum <Pa...@turn.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Monday, April 29, 2013 2:49 PM
> *Subject:* Re: Hardware Selection for Hadoop
>
>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
> my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>
>
>

Re: Hardware Selection for Hadoop

Posted by Mohammad Tariq <do...@gmail.com>.

If I were to start with a 5 node cluster, I would do this :

*Machine 1 : *NN+JT
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD along with a NAS(To make sure
metadata is safe)

*Machine 2 : *SNN*
*
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD

*Machine 3,4,5 : *DN+TT
16GB RAM, 2xQuad Core Proc, 5 x 200GB SATA HDD(JBOD configuation)

I don't think you'll require 64GB RAM and so much of storage just for a
POC(but, it actually depends). You can really kick the ass with 32GB.

NIC(network interface card) is a computer hardware component that connects
a computer to a computer network. It must be reliable to make sure that all
your machines are always connected to the cluster and there is no problem
in the data transfer.

Apart from this ask them to provide you cabinets with good ventilation and
cooling mechanism.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 30, 2013 at 2:17 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Hi,
>
> In 5 node cluster - you mean
>
> Name Node , Job Tracker , Secondary Name Node all on 1
>         64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> Data Trackers and Job Trackers - on 4 machies - each of
>         32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> NIC ?
>
> Also - what other details should I provide to my hardware engineer.
>
> The idea is to start with a Web Log Processing proof of concept.
>
> Please advise.
>
>
>   *From:* Patai Sangbutsarakum <Pa...@turn.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Monday, April 29, 2013 2:49 PM
> *Subject:* Re: Hardware Selection for Hadoop
>
>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
> my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>
>
>

Re: Hardware Selection for Hadoop

Posted by Mohammad Tariq <do...@gmail.com>.

If I were to start with a 5 node cluster, I would do this :

*Machine 1 : *NN+JT
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD along with a NAS(To make sure
metadata is safe)

*Machine 2 : *SNN*
*
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD

*Machine 3,4,5 : *DN+TT
16GB RAM, 2xQuad Core Proc, 5 x 200GB SATA HDD(JBOD configuation)

I don't think you'll require 64GB RAM and so much of storage just for a
POC(but, it actually depends). You can really kick the ass with 32GB.

NIC(network interface card) is a computer hardware component that connects
a computer to a computer network. It must be reliable to make sure that all
your machines are always connected to the cluster and there is no problem
in the data transfer.

Apart from this ask them to provide you cabinets with good ventilation and
cooling mechanism.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 30, 2013 at 2:17 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> Hi,
>
> In 5 node cluster - you mean
>
> Name Node , Job Tracker , Secondary Name Node all on 1
>         64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> Data Trackers and Job Trackers - on 4 machies - each of
>         32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> NIC ?
>
> Also - what other details should I provide to my hardware engineer.
>
> The idea is to start with a Web Log Processing proof of concept.
>
> Please advise.
>
>
>   *From:* Patai Sangbutsarakum <Pa...@turn.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Sent:* Monday, April 29, 2013 2:49 PM
> *Subject:* Re: Hardware Selection for Hadoop
>
>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
> my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>
>
>

Re: Hardware Selection for Hadoop

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,
 
In 5 node cluster - you mean
 
Name Node , Job Tracker , Secondary Name Node all on 1 
        64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
Data Trackers and Job Trackers - on 4 machies - each of
        32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
NIC ?
 
Also - what other details should I provide to my hardware engineer. 
 
The idea is to start with a Web Log Processing proof of concept.
 
Please advise.
 


________________________________
From: Patai Sangbutsarakum <Pa...@turn.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Monday, April 29, 2013 2:49 PM
Subject: Re: Hardware Selection for Hadoop



2 x Quad cores Intel 
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents



On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
 wrote:

Hi,
>
>I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>
>Regards,
>Raj

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.

I understand.

But sometimes there is a lock-in with a particular vendor and you are not
allowed to put the servers inside corporate data center if they are
procured from some another vendor.

The idea is to start from basic and then grow. You can tell me some numbers
in $s if you have, preferred ;), I know sometimes there are no correct
answers.

I got a quote of $4200 for  6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB
RAM. Vendor: HP. Does this sound ok for this configuration?


On Tue, Aug 13, 2013 at 6:15 AM, Chris Embree <ce...@gmail.com> wrote:

> As we always say in Technology... it depends!
>
> What country are you in?  That makes a difference.
> How much buying power do you have?  I work for a Fortune 100 Company and
> we -- absurdly -- pay about 60% off retail when we buy servers.
> Are you buying a bunch at once?
>
> You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
> add-on services, like racking, cabling, configuring servers, etc.  You can
> decide if it's worth it.
>
> The bottom line, there is no correct answer to your question. ;)
>
>
> On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com>wrote:
>
>> Any rough ideas how much this would cost? Actually I kinda require a
>> budget approval and need to put some rough figures in $ on the paper. Help!
>>
>> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
>> 2. I rack mount unit
>> 3. I gbe switch for the rack
>> 4. 10 gbe switch for the network
>>
>> Regards,
>> Sambit Tripathy.
>>
>>
>> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com>wrote:
>>
>>>
>>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> While we have a rough metric on spindles to cores, you end up putting a
>>>> stress on the disk controllers. YMMV.
>>>>
>>>
>>> This is an important comment.
>>>
>>> Some controllers fold when you start pushing too much data.  Testing
>>> nodes independently before installation is important.
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.

I understand.

But sometimes there is a lock-in with a particular vendor and you are not
allowed to put the servers inside corporate data center if they are
procured from some another vendor.

The idea is to start from basic and then grow. You can tell me some numbers
in $s if you have, preferred ;), I know sometimes there are no correct
answers.

I got a quote of $4200 for  6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB
RAM. Vendor: HP. Does this sound ok for this configuration?


On Tue, Aug 13, 2013 at 6:15 AM, Chris Embree <ce...@gmail.com> wrote:

> As we always say in Technology... it depends!
>
> What country are you in?  That makes a difference.
> How much buying power do you have?  I work for a Fortune 100 Company and
> we -- absurdly -- pay about 60% off retail when we buy servers.
> Are you buying a bunch at once?
>
> You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
> add-on services, like racking, cabling, configuring servers, etc.  You can
> decide if it's worth it.
>
> The bottom line, there is no correct answer to your question. ;)
>
>
> On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com>wrote:
>
>> Any rough ideas how much this would cost? Actually I kinda require a
>> budget approval and need to put some rough figures in $ on the paper. Help!
>>
>> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
>> 2. I rack mount unit
>> 3. I gbe switch for the rack
>> 4. 10 gbe switch for the network
>>
>> Regards,
>> Sambit Tripathy.
>>
>>
>> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com>wrote:
>>
>>>
>>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> While we have a rough metric on spindles to cores, you end up putting a
>>>> stress on the disk controllers. YMMV.
>>>>
>>>
>>> This is an important comment.
>>>
>>> Some controllers fold when you start pushing too much data.  Testing
>>> nodes independently before installation is important.
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.

I understand.

But sometimes there is a lock-in with a particular vendor and you are not
allowed to put the servers inside corporate data center if they are
procured from some another vendor.

The idea is to start from basic and then grow. You can tell me some numbers
in $s if you have, preferred ;), I know sometimes there are no correct
answers.

I got a quote of $4200 for  6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB
RAM. Vendor: HP. Does this sound ok for this configuration?


On Tue, Aug 13, 2013 at 6:15 AM, Chris Embree <ce...@gmail.com> wrote:

> As we always say in Technology... it depends!
>
> What country are you in?  That makes a difference.
> How much buying power do you have?  I work for a Fortune 100 Company and
> we -- absurdly -- pay about 60% off retail when we buy servers.
> Are you buying a bunch at once?
>
> You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
> add-on services, like racking, cabling, configuring servers, etc.  You can
> decide if it's worth it.
>
> The bottom line, there is no correct answer to your question. ;)
>
>
> On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com>wrote:
>
>> Any rough ideas how much this would cost? Actually I kinda require a
>> budget approval and need to put some rough figures in $ on the paper. Help!
>>
>> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
>> 2. I rack mount unit
>> 3. I gbe switch for the rack
>> 4. 10 gbe switch for the network
>>
>> Regards,
>> Sambit Tripathy.
>>
>>
>> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com>wrote:
>>
>>>
>>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> While we have a rough metric on spindles to cores, you end up putting a
>>>> stress on the disk controllers. YMMV.
>>>>
>>>
>>> This is an important comment.
>>>
>>> Some controllers fold when you start pushing too much data.  Testing
>>> nodes independently before installation is important.
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.

I understand.

But sometimes there is a lock-in with a particular vendor and you are not
allowed to put the servers inside corporate data center if they are
procured from some another vendor.

The idea is to start from basic and then grow. You can tell me some numbers
in $s if you have, preferred ;), I know sometimes there are no correct
answers.

I got a quote of $4200 for  6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB
RAM. Vendor: HP. Does this sound ok for this configuration?


On Tue, Aug 13, 2013 at 6:15 AM, Chris Embree <ce...@gmail.com> wrote:

> As we always say in Technology... it depends!
>
> What country are you in?  That makes a difference.
> How much buying power do you have?  I work for a Fortune 100 Company and
> we -- absurdly -- pay about 60% off retail when we buy servers.
> Are you buying a bunch at once?
>
> You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
> add-on services, like racking, cabling, configuring servers, etc.  You can
> decide if it's worth it.
>
> The bottom line, there is no correct answer to your question. ;)
>
>
> On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com>wrote:
>
>> Any rough ideas how much this would cost? Actually I kinda require a
>> budget approval and need to put some rough figures in $ on the paper. Help!
>>
>> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
>> 2. I rack mount unit
>> 3. I gbe switch for the rack
>> 4. 10 gbe switch for the network
>>
>> Regards,
>> Sambit Tripathy.
>>
>>
>> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com>wrote:
>>
>>>
>>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> While we have a rough metric on spindles to cores, you end up putting a
>>>> stress on the disk controllers. YMMV.
>>>>
>>>
>>> This is an important comment.
>>>
>>> Some controllers fold when you start pushing too much data.  Testing
>>> nodes independently before installation is important.
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Chris Embree <ce...@gmail.com>.

As we always say in Technology... it depends!

What country are you in?  That makes a difference.
How much buying power do you have?  I work for a Fortune 100 Company and we
-- absurdly -- pay about 60% off retail when we buy servers.
Are you buying a bunch at once?

You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
add-on services, like racking, cabling, configuring servers, etc.  You can
decide if it's worth it.

The bottom line, there is no correct answer to your question. ;)


On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com> wrote:

> Any rough ideas how much this would cost? Actually I kinda require a
> budget approval and need to put some rough figures in $ on the paper. Help!
>
> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
> 2. I rack mount unit
> 3. I gbe switch for the rack
> 4. 10 gbe switch for the network
>
> Regards,
> Sambit Tripathy.
>
>
> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>>
>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>> While we have a rough metric on spindles to cores, you end up putting a
>>> stress on the disk controllers. YMMV.
>>>
>>
>> This is an important comment.
>>
>> Some controllers fold when you start pushing too much data.  Testing
>> nodes independently before installation is important.
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Chris Embree <ce...@gmail.com>.

As we always say in Technology... it depends!

What country are you in?  That makes a difference.
How much buying power do you have?  I work for a Fortune 100 Company and we
-- absurdly -- pay about 60% off retail when we buy servers.
Are you buying a bunch at once?

You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
add-on services, like racking, cabling, configuring servers, etc.  You can
decide if it's worth it.

The bottom line, there is no correct answer to your question. ;)


On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com> wrote:

> Any rough ideas how much this would cost? Actually I kinda require a
> budget approval and need to put some rough figures in $ on the paper. Help!
>
> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
> 2. I rack mount unit
> 3. I gbe switch for the rack
> 4. 10 gbe switch for the network
>
> Regards,
> Sambit Tripathy.
>
>
> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>>
>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>> While we have a rough metric on spindles to cores, you end up putting a
>>> stress on the disk controllers. YMMV.
>>>
>>
>> This is an important comment.
>>
>> Some controllers fold when you start pushing too much data.  Testing
>> nodes independently before installation is important.
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Chris Embree <ce...@gmail.com>.

As we always say in Technology... it depends!

What country are you in?  That makes a difference.
How much buying power do you have?  I work for a Fortune 100 Company and we
-- absurdly -- pay about 60% off retail when we buy servers.
Are you buying a bunch at once?

You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
add-on services, like racking, cabling, configuring servers, etc.  You can
decide if it's worth it.

The bottom line, there is no correct answer to your question. ;)


On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com> wrote:

> Any rough ideas how much this would cost? Actually I kinda require a
> budget approval and need to put some rough figures in $ on the paper. Help!
>
> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
> 2. I rack mount unit
> 3. I gbe switch for the rack
> 4. 10 gbe switch for the network
>
> Regards,
> Sambit Tripathy.
>
>
> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>>
>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>> While we have a rough metric on spindles to cores, you end up putting a
>>> stress on the disk controllers. YMMV.
>>>
>>
>> This is an important comment.
>>
>> Some controllers fold when you start pushing too much data.  Testing
>> nodes independently before installation is important.
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Chris Embree <ce...@gmail.com>.

As we always say in Technology... it depends!

What country are you in?  That makes a difference.
How much buying power do you have?  I work for a Fortune 100 Company and we
-- absurdly -- pay about 60% off retail when we buy servers.
Are you buying a bunch at once?

You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
add-on services, like racking, cabling, configuring servers, etc.  You can
decide if it's worth it.

The bottom line, there is no correct answer to your question. ;)


On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com> wrote:

> Any rough ideas how much this would cost? Actually I kinda require a
> budget approval and need to put some rough figures in $ on the paper. Help!
>
> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
> 2. I rack mount unit
> 3. I gbe switch for the rack
> 4. 10 gbe switch for the network
>
> Regards,
> Sambit Tripathy.
>
>
> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>>
>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>> While we have a rough metric on spindles to cores, you end up putting a
>>> stress on the disk controllers. YMMV.
>>>
>>
>> This is an important comment.
>>
>> Some controllers fold when you start pushing too much data.  Testing
>> nodes independently before installation is important.
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.

Any rough ideas how much this would cost? Actually I kinda require a budget
approval and need to put some rough figures in $ on the paper. Help!

1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
2. I rack mount unit
3. I gbe switch for the rack
4. 10 gbe switch for the network

Regards,
Sambit Tripathy.

On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>
>> While we have a rough metric on spindles to cores, you end up putting a
>> stress on the disk controllers. YMMV.
>>
>
> This is an important comment.
>
> Some controllers fold when you start pushing too much data.  Testing nodes
> independently before installation is important.
>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.

Any rough ideas how much this would cost? Actually I kinda require a budget
approval and need to put some rough figures in $ on the paper. Help!

1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
2. I rack mount unit
3. I gbe switch for the rack
4. 10 gbe switch for the network

Regards,
Sambit Tripathy.

On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>
>> While we have a rough metric on spindles to cores, you end up putting a
>> stress on the disk controllers. YMMV.
>>
>
> This is an important comment.
>
> Some controllers fold when you start pushing too much data.  Testing nodes
> independently before installation is important.
>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.

Any rough ideas how much this would cost? Actually I kinda require a budget
approval and need to put some rough figures in $ on the paper. Help!

1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
2. I rack mount unit
3. I gbe switch for the rack
4. 10 gbe switch for the network

Regards,
Sambit Tripathy.

On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>
>> While we have a rough metric on spindles to cores, you end up putting a
>> stress on the disk controllers. YMMV.
>>
>
> This is an important comment.
>
> Some controllers fold when you start pushing too much data.  Testing nodes
> independently before installation is important.
>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.

Any rough ideas how much this would cost? Actually I kinda require a budget
approval and need to put some rough figures in $ on the paper. Help!

1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
2. I rack mount unit
3. I gbe switch for the rack
4. 10 gbe switch for the network

Regards,
Sambit Tripathy.

On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>
>> While we have a rough metric on spindles to cores, you end up putting a
>> stress on the disk controllers. YMMV.
>>
>
> This is an important comment.
>
> Some controllers fold when you start pushing too much data.  Testing nodes
> independently before installation is important.
>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:

> While we have a rough metric on spindles to cores, you end up putting a
> stress on the disk controllers. YMMV.
>

This is an important comment.

Some controllers fold when you start pushing too much data.  Testing nodes
independently before installation is important.

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:

> While we have a rough metric on spindles to cores, you end up putting a
> stress on the disk controllers. YMMV.
>

This is an important comment.

Some controllers fold when you start pushing too much data.  Testing nodes
independently before installation is important.

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:

> While we have a rough metric on spindles to cores, you end up putting a
> stress on the disk controllers. YMMV.
>

This is an important comment.

Some controllers fold when you start pushing too much data.  Testing nodes
independently before installation is important.

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:

> While we have a rough metric on spindles to cores, you end up putting a
> stress on the disk controllers. YMMV.
>

This is an important comment.

Some controllers fold when you start pushing too much data.  Testing nodes
independently before installation is important.

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.

I wouldn't.

You end up with a 'Frankencluster' which could become problematic down the road. 

Ever try to debug a port failure on a switch? (It does happen and its a bitch.) 
Note that you say 'reliable'... older hardware may or may not be reliable.... or under warranty.
(How many here build their own servers from the components up?  ;-) 

I'm not suggesting that you go out and buy a 10 core cpu, however, depending on who you are, and what your budget is... it may make sense. o 
Even for a proof of concept. ;-) 

While we have a rough metric on spindles to cores, you end up putting a stress on the disk controllers. YMMV.

As to spending $$$ on hardware for  a PoC, its not only relative... but also what makes you think this is the first PoC and only PoC he's going to do? The point is that hardware is reusable and it also sets a pattern for what the future cluster will look like. After this PoC, why not look at Storm, Mesos, Spark, Shark, etc... 

Trust me, as someone who has had to fight for allocation of hardware dollars for R&D... get the best bang you can for your buck.

HTH

-Mike

On May 6, 2013, at 5:57 PM, Patai Sangbutsarakum <Pa...@turn.com> wrote:

> I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
> Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
> Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.
> 
> You will get the idea how is your application will behave, how big of the data set you will play with
> is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration. 
> 
> 
> 
> On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com> wrote:
> 
>> 8 physical cores is so 2009 - 2010 :-)
>> 
>> Intel now offers a chip w 10 physical cores on a die. 
>> You are better off thinking of 4-8 GB per physical core. 
>> It depends on what you want to do, and what you think you may want to do...
>> 
>> It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:
>> 
>>> 
>>> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>> 
>>> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
>>> 
>>> If you look at the different configurations mentioned in this thread, you will see different limitations.
>>> 
>>> For instance:
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>> 
>>> This configuration is mostly limited by networking bandwidth
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>>  
>>> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>>>  
>>> 
>>> 
>>> 
>>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>>> 
>>> 
>>> 
>

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.

I wouldn't.

You end up with a 'Frankencluster' which could become problematic down the road. 

Ever try to debug a port failure on a switch? (It does happen and its a bitch.) 
Note that you say 'reliable'... older hardware may or may not be reliable.... or under warranty.
(How many here build their own servers from the components up?  ;-) 

I'm not suggesting that you go out and buy a 10 core cpu, however, depending on who you are, and what your budget is... it may make sense. o 
Even for a proof of concept. ;-) 

While we have a rough metric on spindles to cores, you end up putting a stress on the disk controllers. YMMV.

As to spending $$$ on hardware for  a PoC, its not only relative... but also what makes you think this is the first PoC and only PoC he's going to do? The point is that hardware is reusable and it also sets a pattern for what the future cluster will look like. After this PoC, why not look at Storm, Mesos, Spark, Shark, etc... 

Trust me, as someone who has had to fight for allocation of hardware dollars for R&D... get the best bang you can for your buck.

HTH

-Mike

On May 6, 2013, at 5:57 PM, Patai Sangbutsarakum <Pa...@turn.com> wrote:

> I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
> Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
> Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.
> 
> You will get the idea how is your application will behave, how big of the data set you will play with
> is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration. 
> 
> 
> 
> On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com> wrote:
> 
>> 8 physical cores is so 2009 - 2010 :-)
>> 
>> Intel now offers a chip w 10 physical cores on a die. 
>> You are better off thinking of 4-8 GB per physical core. 
>> It depends on what you want to do, and what you think you may want to do...
>> 
>> It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:
>> 
>>> 
>>> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>> 
>>> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
>>> 
>>> If you look at the different configurations mentioned in this thread, you will see different limitations.
>>> 
>>> For instance:
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>> 
>>> This configuration is mostly limited by networking bandwidth
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>>  
>>> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>>>  
>>> 
>>> 
>>> 
>>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>>> 
>>> 
>>> 
>

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.

I wouldn't.

You end up with a 'Frankencluster' which could become problematic down the road. 

Ever try to debug a port failure on a switch? (It does happen and its a bitch.) 
Note that you say 'reliable'... older hardware may or may not be reliable.... or under warranty.
(How many here build their own servers from the components up?  ;-) 

I'm not suggesting that you go out and buy a 10 core cpu, however, depending on who you are, and what your budget is... it may make sense. o 
Even for a proof of concept. ;-) 

While we have a rough metric on spindles to cores, you end up putting a stress on the disk controllers. YMMV.

As to spending $$$ on hardware for  a PoC, its not only relative... but also what makes you think this is the first PoC and only PoC he's going to do? The point is that hardware is reusable and it also sets a pattern for what the future cluster will look like. After this PoC, why not look at Storm, Mesos, Spark, Shark, etc... 

Trust me, as someone who has had to fight for allocation of hardware dollars for R&D... get the best bang you can for your buck.

HTH

-Mike

On May 6, 2013, at 5:57 PM, Patai Sangbutsarakum <Pa...@turn.com> wrote:

> I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
> Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
> Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.
> 
> You will get the idea how is your application will behave, how big of the data set you will play with
> is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration. 
> 
> 
> 
> On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com> wrote:
> 
>> 8 physical cores is so 2009 - 2010 :-)
>> 
>> Intel now offers a chip w 10 physical cores on a die. 
>> You are better off thinking of 4-8 GB per physical core. 
>> It depends on what you want to do, and what you think you may want to do...
>> 
>> It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:
>> 
>>> 
>>> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>> 
>>> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
>>> 
>>> If you look at the different configurations mentioned in this thread, you will see different limitations.
>>> 
>>> For instance:
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>> 
>>> This configuration is mostly limited by networking bandwidth
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>>  
>>> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>>>  
>>> 
>>> 
>>> 
>>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>>> 
>>> 
>>> 
>

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.

I wouldn't.

You end up with a 'Frankencluster' which could become problematic down the road. 

Ever try to debug a port failure on a switch? (It does happen and its a bitch.) 
Note that you say 'reliable'... older hardware may or may not be reliable.... or under warranty.
(How many here build their own servers from the components up?  ;-) 

I'm not suggesting that you go out and buy a 10 core cpu, however, depending on who you are, and what your budget is... it may make sense. o 
Even for a proof of concept. ;-) 

While we have a rough metric on spindles to cores, you end up putting a stress on the disk controllers. YMMV.

As to spending $$$ on hardware for  a PoC, its not only relative... but also what makes you think this is the first PoC and only PoC he's going to do? The point is that hardware is reusable and it also sets a pattern for what the future cluster will look like. After this PoC, why not look at Storm, Mesos, Spark, Shark, etc... 

Trust me, as someone who has had to fight for allocation of hardware dollars for R&D... get the best bang you can for your buck.

HTH

-Mike

On May 6, 2013, at 5:57 PM, Patai Sangbutsarakum <Pa...@turn.com> wrote:

> I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
> Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
> Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.
> 
> You will get the idea how is your application will behave, how big of the data set you will play with
> is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration. 
> 
> 
> 
> On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com> wrote:
> 
>> 8 physical cores is so 2009 - 2010 :-)
>> 
>> Intel now offers a chip w 10 physical cores on a die. 
>> You are better off thinking of 4-8 GB per physical core. 
>> It depends on what you want to do, and what you think you may want to do...
>> 
>> It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:
>> 
>>> 
>>> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>> 
>>> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
>>> 
>>> If you look at the different configurations mentioned in this thread, you will see different limitations.
>>> 
>>> For instance:
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>> 
>>> This configuration is mostly limited by networking bandwidth
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>>  
>>> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>>>  
>>> 
>>> 
>>> 
>>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>>> 
>>> 
>>> 
>

Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.

You will get the idea how is your application will behave, how big of the data set you will play with
is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration.



On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com>> wrote:

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com>> wrote:


Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
64GB mem                <==== slightly larger than necessary
2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
24GB mem                <==== 24GB << 8 x 6GB
2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com>> wrote:
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com>> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.

You will get the idea how is your application will behave, how big of the data set you will play with
is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration.



On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com>> wrote:

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com>> wrote:


Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
64GB mem                <==== slightly larger than necessary
2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
24GB mem                <==== 24GB << 8 x 6GB
2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com>> wrote:
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com>> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.

You will get the idea how is your application will behave, how big of the data set you will play with
is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration.



On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com>> wrote:

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com>> wrote:


Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
64GB mem                <==== slightly larger than necessary
2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
24GB mem                <==== 24GB << 8 x 6GB
2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com>> wrote:
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com>> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.

You will get the idea how is your application will behave, how big of the data set you will play with
is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration.



On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com>> wrote:

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com>> wrote:


Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
64GB mem                <==== slightly larger than necessary
2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
24GB mem                <==== 24GB << 8 x 6GB
2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com>> wrote:
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com>> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: Hardware Selection for Hadoop

Posted by Michel Segel <mi...@hotmail.com>.

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die. 
You are better off thinking of 4-8 GB per physical core. 
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:

> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>> 
>> 
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>

Re: Hardware Selection for Hadoop

Posted by Michel Segel <mi...@hotmail.com>.

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die. 
You are better off thinking of 4-8 GB per physical core. 
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:

> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>> 
>> 
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>

Re: Hardware Selection for Hadoop

Posted by Michel Segel <mi...@hotmail.com>.

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die. 
You are better off thinking of 4-8 GB per physical core. 
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:

> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>> 
>> 
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks Mohit and Ted!


On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>>  2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>>  my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks Mohit and Ted!


On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>>  2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>>  my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.

I wouldn't go the route of multiple nics unless you are using MapR. 
MapR allows you to do port bonding  or rather use both ports simultaneously. 
When you port bond. 1+1 != 2 and then you have some other configuration issues. 
(Unless they've fixed them)

If this is your first cluster... keep it simple.  If your machine comes w 2 nic ports, use one and then once you're an 'expurt',  turn on the second port. 

HTH

-Mike

On May 5, 2013, at 11:05 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> Multiple NICs provide 2 benefits, 1) high availability 2) increases the network bandwidth when using LACP type model.
> 
> On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> OK. I do not know if I understand the spindle / core thing. I will dig more into that.
> 
> Thanks for the info. 
> 
> One more thing , whats the significance of multiple NIC.
> 
> Thanks,
> Rahul
> 
> 
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:
> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
> 
> 
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
> 
> my 2 cents
> 
> 
> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
> 
>> Hi,
>>  
>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>  
>> Regards,
>> Raj
> 
> 
> 
> 
>

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.

I wouldn't go the route of multiple nics unless you are using MapR. 
MapR allows you to do port bonding  or rather use both ports simultaneously. 
When you port bond. 1+1 != 2 and then you have some other configuration issues. 
(Unless they've fixed them)

If this is your first cluster... keep it simple.  If your machine comes w 2 nic ports, use one and then once you're an 'expurt',  turn on the second port. 

HTH

-Mike

On May 5, 2013, at 11:05 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> Multiple NICs provide 2 benefits, 1) high availability 2) increases the network bandwidth when using LACP type model.
> 
> On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> OK. I do not know if I understand the spindle / core thing. I will dig more into that.
> 
> Thanks for the info. 
> 
> One more thing , whats the significance of multiple NIC.
> 
> Thanks,
> Rahul
> 
> 
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:
> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
> 
> 
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
> 
> my 2 cents
> 
> 
> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
> 
>> Hi,
>>  
>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>  
>> Regards,
>> Raj
> 
> 
> 
> 
>

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.

I wouldn't go the route of multiple nics unless you are using MapR. 
MapR allows you to do port bonding  or rather use both ports simultaneously. 
When you port bond. 1+1 != 2 and then you have some other configuration issues. 
(Unless they've fixed them)

If this is your first cluster... keep it simple.  If your machine comes w 2 nic ports, use one and then once you're an 'expurt',  turn on the second port. 

HTH

-Mike

On May 5, 2013, at 11:05 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> Multiple NICs provide 2 benefits, 1) high availability 2) increases the network bandwidth when using LACP type model.
> 
> On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> OK. I do not know if I understand the spindle / core thing. I will dig more into that.
> 
> Thanks for the info. 
> 
> One more thing , whats the significance of multiple NIC.
> 
> Thanks,
> Rahul
> 
> 
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:
> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
> 
> 
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
> 
> my 2 cents
> 
> 
> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
> 
>> Hi,
>>  
>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>  
>> Regards,
>> Raj
> 
> 
> 
> 
>

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.

I wouldn't go the route of multiple nics unless you are using MapR. 
MapR allows you to do port bonding  or rather use both ports simultaneously. 
When you port bond. 1+1 != 2 and then you have some other configuration issues. 
(Unless they've fixed them)

If this is your first cluster... keep it simple.  If your machine comes w 2 nic ports, use one and then once you're an 'expurt',  turn on the second port. 

HTH

-Mike

On May 5, 2013, at 11:05 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> Multiple NICs provide 2 benefits, 1) high availability 2) increases the network bandwidth when using LACP type model.
> 
> On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> OK. I do not know if I understand the spindle / core thing. I will dig more into that.
> 
> Thanks for the info. 
> 
> One more thing , whats the significance of multiple NIC.
> 
> Thanks,
> Rahul
> 
> 
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:
> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
> 
> 
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
> 
> my 2 cents
> 
> 
> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
> 
>> Hi,
>>  
>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>  
>> Regards,
>> Raj
> 
> 
> 
> 
>

Re: Hardware Selection for Hadoop

Posted by Mohit Anchlia <mo...@gmail.com>.

Multiple NICs provide 2 benefits, 1) high availability 2) increases the
network bandwidth when using LACP type model.

On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

>  OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>>  IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>> 2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>> my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Mohit Anchlia <mo...@gmail.com>.

Multiple NICs provide 2 benefits, 1) high availability 2) increases the
network bandwidth when using LACP type model.

On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

>  OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>>  IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>> 2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>> my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Mohit Anchlia <mo...@gmail.com>.

Multiple NICs provide 2 benefits, 1) high availability 2) increases the
network bandwidth when using LACP type model.

On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

>  OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>>  IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>> 2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>> my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks Mohit and Ted!


On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>>  2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>>  my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Mohit Anchlia <mo...@gmail.com>.

Multiple NICs provide 2 benefits, 1) high availability 2) increases the
network bandwidth when using LACP type model.

On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

>  OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>>  IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>> 2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>> my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks Mohit and Ted!


On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>>  2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>>  my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

OK. I do not know if I understand the spindle / core thing. I will dig more
into that.

Thanks for the info.

One more thing , whats the significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:

>
> Data nodes normally are also task nodes.  With 8 physical cores it isn't
> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>
> Achieving highest performance requires that you match the capabilities of
> your nodes including CPU, memory, disk and networking.  The standard wisdom
> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
> disk bandwidth available as network bandwidth.
>
> If you look at the different configurations mentioned in this thread, you
> will see different limitations.
>
> For instance:
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is mostly limited by networking bandwidth
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is weak on disk relative to CPU and very weak on disk
> relative to network speed.  The worst problem, however, is likely to be
> small memory.  This will likely require us to decrease the number of slots
> by half or more making it impossible to even use the 6 disks that we have
> and making the network even more outrageously over-provisioned.
>
>
>
>
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>
>>
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>> Patai.Sangbutsarakum@turn.com> wrote:
>>
>>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>>
>>>  my 2 cents
>>>
>>>
>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>>
>>>      Hi,
>>>
>>> I have to propose some hardware requirements in my company for a Proof
>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>> Cloudera Website. But just wanted to know from the group - what is the
>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>> time, the data that need to be processed at this time for the Proof of
>>> Concept. So - can you suggest something to me?
>>>
>>> Regards,
>>> Raj
>>>
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

OK. I do not know if I understand the spindle / core thing. I will dig more
into that.

Thanks for the info.

One more thing , whats the significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:

>
> Data nodes normally are also task nodes.  With 8 physical cores it isn't
> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>
> Achieving highest performance requires that you match the capabilities of
> your nodes including CPU, memory, disk and networking.  The standard wisdom
> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
> disk bandwidth available as network bandwidth.
>
> If you look at the different configurations mentioned in this thread, you
> will see different limitations.
>
> For instance:
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is mostly limited by networking bandwidth
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is weak on disk relative to CPU and very weak on disk
> relative to network speed.  The worst problem, however, is likely to be
> small memory.  This will likely require us to decrease the number of slots
> by half or more making it impossible to even use the 6 disks that we have
> and making the network even more outrageously over-provisioned.
>
>
>
>
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>
>>
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>> Patai.Sangbutsarakum@turn.com> wrote:
>>
>>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>>
>>>  my 2 cents
>>>
>>>
>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>>
>>>      Hi,
>>>
>>> I have to propose some hardware requirements in my company for a Proof
>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>> Cloudera Website. But just wanted to know from the group - what is the
>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>> time, the data that need to be processed at this time for the Proof of
>>> Concept. So - can you suggest something to me?
>>>
>>> Regards,
>>> Raj
>>>
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Michel Segel <mi...@hotmail.com>.

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die. 
You are better off thinking of 4-8 GB per physical core. 
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:

> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>> 
>> 
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

OK. I do not know if I understand the spindle / core thing. I will dig more
into that.

Thanks for the info.

One more thing , whats the significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:

>
> Data nodes normally are also task nodes.  With 8 physical cores it isn't
> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>
> Achieving highest performance requires that you match the capabilities of
> your nodes including CPU, memory, disk and networking.  The standard wisdom
> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
> disk bandwidth available as network bandwidth.
>
> If you look at the different configurations mentioned in this thread, you
> will see different limitations.
>
> For instance:
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is mostly limited by networking bandwidth
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is weak on disk relative to CPU and very weak on disk
> relative to network speed.  The worst problem, however, is likely to be
> small memory.  This will likely require us to decrease the number of slots
> by half or more making it impossible to even use the 6 disks that we have
> and making the network even more outrageously over-provisioned.
>
>
>
>
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>
>>
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>> Patai.Sangbutsarakum@turn.com> wrote:
>>
>>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>>
>>>  my 2 cents
>>>
>>>
>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>>
>>>      Hi,
>>>
>>> I have to propose some hardware requirements in my company for a Proof
>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>> Cloudera Website. But just wanted to know from the group - what is the
>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>> time, the data that need to be processed at this time for the Proof of
>>> Concept. So - can you suggest something to me?
>>>
>>> Regards,
>>> Raj
>>>
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

OK. I do not know if I understand the spindle / core thing. I will dig more
into that.

Thanks for the info.

One more thing , whats the significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:

>
> Data nodes normally are also task nodes.  With 8 physical cores it isn't
> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>
> Achieving highest performance requires that you match the capabilities of
> your nodes including CPU, memory, disk and networking.  The standard wisdom
> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
> disk bandwidth available as network bandwidth.
>
> If you look at the different configurations mentioned in this thread, you
> will see different limitations.
>
> For instance:
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is mostly limited by networking bandwidth
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is weak on disk relative to CPU and very weak on disk
> relative to network speed.  The worst problem, however, is likely to be
> small memory.  This will likely require us to decrease the number of slots
> by half or more making it impossible to even use the 6 disks that we have
> and making the network even more outrageously over-provisioned.
>
>
>
>
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>
>>
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>> Patai.Sangbutsarakum@turn.com> wrote:
>>
>>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>>
>>>  my 2 cents
>>>
>>>
>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>>
>>>      Hi,
>>>
>>> I have to propose some hardware requirements in my company for a Proof
>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>> Cloudera Website. But just wanted to know from the group - what is the
>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>> time, the data that need to be processed at this time for the Proof of
>>> Concept. So - can you suggest something to me?
>>>
>>> Regards,
>>> Raj
>>>
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

Data nodes normally are also task nodes.  With 8 physical cores it isn't
that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of
your nodes including CPU, memory, disk and networking.  The standard wisdom
is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you
will see different limitations.

For instance:

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk
relative to network speed.  The worst problem, however, is likely to be
small memory.  This will likely require us to decrease the number of slots
by half or more making it impossible to even use the 6 disks that we have
and making the network even more outrageously over-provisioned.

On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>
>
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
> Patai.Sangbutsarakum@turn.com> wrote:
>
>>  2 x Quad cores Intel
>> 2-3 TB x 6 SATA
>> 64GB mem
>> 2 NICs teaming
>>
>>  my 2 cents
>>
>>
>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>  wrote:
>>
>>      Hi,
>>
>> I have to propose some hardware requirements in my company for a Proof of
>> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
>> Website. But just wanted to know from the group - what is the requirements
>> if I have to plan for a 5 node cluster. I dont know at this time, the data
>> that need to be processed at this time for the Proof of Concept. So - can
>> you suggest something to me?
>>
>> Regards,
>> Raj
>>
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

Data nodes normally are also task nodes.  With 8 physical cores it isn't
that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of
your nodes including CPU, memory, disk and networking.  The standard wisdom
is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you
will see different limitations.

For instance:

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk
relative to network speed.  The worst problem, however, is likely to be
small memory.  This will likely require us to decrease the number of slots
by half or more making it impossible to even use the 6 disks that we have
and making the network even more outrageously over-provisioned.

On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>
>
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
> Patai.Sangbutsarakum@turn.com> wrote:
>
>>  2 x Quad cores Intel
>> 2-3 TB x 6 SATA
>> 64GB mem
>> 2 NICs teaming
>>
>>  my 2 cents
>>
>>
>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>  wrote:
>>
>>      Hi,
>>
>> I have to propose some hardware requirements in my company for a Proof of
>> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
>> Website. But just wanted to know from the group - what is the requirements
>> if I have to plan for a 5 node cluster. I dont know at this time, the data
>> that need to be processed at this time for the Proof of Concept. So - can
>> you suggest something to me?
>>
>> Regards,
>> Raj
>>
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

Data nodes normally are also task nodes.  With 8 physical cores it isn't
that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of
your nodes including CPU, memory, disk and networking.  The standard wisdom
is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you
will see different limitations.

For instance:

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk
relative to network speed.  The worst problem, however, is likely to be
small memory.  This will likely require us to decrease the number of slots
by half or more making it impossible to even use the 6 disks that we have
and making the network even more outrageously over-provisioned.

On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>
>
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
> Patai.Sangbutsarakum@turn.com> wrote:
>
>>  2 x Quad cores Intel
>> 2-3 TB x 6 SATA
>> 64GB mem
>> 2 NICs teaming
>>
>>  my 2 cents
>>
>>
>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>  wrote:
>>
>>      Hi,
>>
>> I have to propose some hardware requirements in my company for a Proof of
>> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
>> Website. But just wanted to know from the group - what is the requirements
>> if I have to plan for a 5 node cluster. I dont know at this time, the data
>> that need to be processed at this time for the Proof of Concept. So - can
>> you suggest something to me?
>>
>> Regards,
>> Raj
>>
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

Data nodes normally are also task nodes.  With 8 physical cores it isn't
that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of
your nodes including CPU, memory, disk and networking.  The standard wisdom
is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you
will see different limitations.

For instance:

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk
relative to network speed.  The worst problem, however, is likely to be
small memory.  This will likely require us to decrease the number of slots
by half or more making it impossible to even use the 6 disks that we have
and making the network even more outrageously over-provisioned.

On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>
>
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
> Patai.Sangbutsarakum@turn.com> wrote:
>
>>  2 x Quad cores Intel
>> 2-3 TB x 6 SATA
>> 64GB mem
>> 2 NICs teaming
>>
>>  my 2 cents
>>
>>
>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>  wrote:
>>
>>      Hi,
>>
>> I have to propose some hardware requirements in my company for a Proof of
>> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
>> Website. But just wanted to know from the group - what is the requirements
>> if I have to plan for a 5 node cluster. I dont know at this time, the data
>> that need to be processed at this time for the Proof of Concept. So - can
>> you suggest something to me?
>>
>> Regards,
>> Raj
>>
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,
 
In 5 node cluster - you mean
 
Name Node , Job Tracker , Secondary Name Node all on 1 
        64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
Data Trackers and Job Trackers - on 4 machies - each of
        32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
NIC ?
 
Also - what other details should I provide to my hardware engineer. 
 
The idea is to start with a Web Log Processing proof of concept.
 
Please advise.
 


________________________________
From: Patai Sangbutsarakum <Pa...@turn.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Monday, April 29, 2013 2:49 PM
Subject: Re: Hardware Selection for Hadoop



2 x Quad cores Intel 
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents



On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
 wrote:

Hi,
>
>I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>
>Regards,
>Raj

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

I think that having more than 6 drives is better.

More memory never hurts.  If you have too little, you may have to run with
fewer slots than optimal.

10GB networking is good.  If not, having more than 2 1GBe ports is good, at
least on distributions that can deal with them properly.


On Mon, Apr 29, 2013 at 11:49 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,
 
In 5 node cluster - you mean
 
Name Node , Job Tracker , Secondary Name Node all on 1 
        64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
Data Trackers and Job Trackers - on 4 machies - each of
        32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
NIC ?
 
Also - what other details should I provide to my hardware engineer. 
 
The idea is to start with a Web Log Processing proof of concept.
 
Please advise.
 


________________________________
From: Patai Sangbutsarakum <Pa...@turn.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Monday, April 29, 2013 2:49 PM
Subject: Re: Hardware Selection for Hadoop



2 x Quad cores Intel 
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents



On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
 wrote:

Hi,
>
>I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>
>Regards,
>Raj

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

I think that having more than 6 drives is better.

More memory never hurts.  If you have too little, you may have to run with
fewer slots than optimal.

10GB networking is good.  If not, having more than 2 1GBe ports is good, at
least on distributions that can deal with them properly.


On Mon, Apr 29, 2013 at 11:49 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

I think that having more than 6 drives is better.

More memory never hurts.  If you have too little, you may have to run with
fewer slots than optimal.

10GB networking is good.  If not, having more than 2 1GBe ports is good, at
least on distributions that can deal with them properly.


On Mon, Apr 29, 2013 at 11:49 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.

I think that having more than 6 drives is better.

More memory never hurts.  If you have too little, you may have to run with
fewer slots than optimal.

10GB networking is good.  If not, having more than 2 1GBe ports is good, at
least on distributions that can deal with them properly.


On Mon, Apr 29, 2013 at 11:49 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,
 
In 5 node cluster - you mean
 
Name Node , Job Tracker , Secondary Name Node all on 1 
        64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
Data Trackers and Job Trackers - on 4 machies - each of
        32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
NIC ?
 
Also - what other details should I provide to my hardware engineer. 
 
The idea is to start with a Web Log Processing proof of concept.
 
Please advise.
 


________________________________
From: Patai Sangbutsarakum <Pa...@turn.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Monday, April 29, 2013 2:49 PM
Subject: Re: Hardware Selection for Hadoop



2 x Quad cores Intel 
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents



On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
 wrote:

Hi,
>
>I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>
>Regards,
>Raj

Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: Hardware Selection for Hadoop

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

Regards, Raj. To know that data that you want to process with Hadoop is
critical for this, at least an approximation of the data. I think that
Hadoop Operations is an invaluable resource for this:

- Hadoop use heavily RAM, so, the first resource that you have to consider
is to use all available RAM that you could give to the nodes, with a marked
focus on the NameNode/JobTracker Node.

- For the DataNode/TaskTracker nodes, is very good to have fast disks, like
SSDs but they are expensive, so you can consider this too. For me WD
Barracuda are awesome.

- A good network connection between the nodes. Hadoop is a RCP-based
platform, so a good network is critical for a healthy cluster

A good start for me is for a small cluster:

- NN/JT: 8 to 16 GB RAM
- DN/TT: 4 to 8 GB RAM

Consider to use always compression, to optimize the communication between
all services in your Hadoop cluster (Snappy is my favorite)

All these advices are in the Hadoop Operations book from Eric, so, it´s
must-read for every Hadoop System Engineer.



2013/4/29 Raj Hadoop <ha...@yahoo.com>

>    Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Hardware Selection for Hadoop

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Ted Yu <yu...@gmail.com>.

bq. 'java  -cp /usr/lib/hbase/hbase...

Instead of hard coding class path, can you try specifying `hbase classpath`
?

Cheers

On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:

> Hello,
>
> This might be something very obvious that I am missing but this has been
> bugging me and I am unable to find what am I missing?
>
> I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2
> and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase
> shell and hadoop commands.
>
> When I give the following command:
>
> 'hbase version'
>
> I get the following output which is correct and expected:
> -----------------------
> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> But when I I kick of the VersionInfo class manually (I do see that there
> is a main method in there), I get an Unknown result? Why is that?
> Command:
> 'java  -cp
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> org.apache.hadoop.hbase.util.VersionInfo'
>
> Output:
> -----------------------
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: HBase Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Subversion Unknown -r Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Compiled by Unknown on Unknown
>
> Now this is causing problems when I am trying to run my HBase client on
> this machine as the it aborts with the following error:
> -----------------------
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>    at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> This means that the hbase-default.xml in the hbase jar is being picked up
> but the version info captured/compiled through annotations is not? How is
> it possible if 'hbase shell' (or hadoop version') works fine!
>
> Please advise. Thanks a lot. I will be very grateful.
>
> Regards,
> Shahab
>

Hardware Selection for Hadoop

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Hardware Selection for Hadoop

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Ted Yu <yu...@gmail.com>.

bq. 'java  -cp /usr/lib/hbase/hbase...

Instead of hard coding class path, can you try specifying `hbase classpath`
?

Cheers

On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:

> Hello,
>
> This might be something very obvious that I am missing but this has been
> bugging me and I am unable to find what am I missing?
>
> I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2
> and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase
> shell and hadoop commands.
>
> When I give the following command:
>
> 'hbase version'
>
> I get the following output which is correct and expected:
> -----------------------
> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> But when I I kick of the VersionInfo class manually (I do see that there
> is a main method in there), I get an Unknown result? Why is that?
> Command:
> 'java  -cp
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> org.apache.hadoop.hbase.util.VersionInfo'
>
> Output:
> -----------------------
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: HBase Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Subversion Unknown -r Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Compiled by Unknown on Unknown
>
> Now this is causing problems when I am trying to run my HBase client on
> this machine as the it aborts with the following error:
> -----------------------
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>    at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> This means that the hbase-default.xml in the hbase jar is being picked up
> but the version info captured/compiled through annotations is not? How is
> it possible if 'hbase shell' (or hadoop version') works fine!
>
> Please advise. Thanks a lot. I will be very grateful.
>
> Regards,
> Shahab
>

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

Posted by Ted Yu <yu...@gmail.com>.

bq. 'java  -cp /usr/lib/hbase/hbase...

Instead of hard coding class path, can you try specifying `hbase classpath`
?

Cheers

On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <sh...@gmail.com>wrote:

> Hello,
>
> This might be something very obvious that I am missing but this has been
> bugging me and I am unable to find what am I missing?
>
> I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2
> and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase
> shell and hadoop commands.
>
> When I give the following command:
>
> 'hbase version'
>
> I get the following output which is correct and expected:
> -----------------------
> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> But when I I kick of the VersionInfo class manually (I do see that there
> is a main method in there), I get an Unknown result? Why is that?
> Command:
> 'java  -cp
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> org.apache.hadoop.hbase.util.VersionInfo'
>
> Output:
> -----------------------
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: HBase Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Subversion Unknown -r Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Compiled by Unknown on Unknown
>
> Now this is causing problems when I am trying to run my HBase client on
> this machine as the it aborts with the following error:
> -----------------------
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>    at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> This means that the hbase-default.xml in the hbase jar is being picked up
> but the version info captured/compiled through annotations is not? How is
> it possible if 'hbase shell' (or hadoop version') works fine!
>
> Please advise. Thanks a lot. I will be very grateful.
>
> Regards,
> Shahab
>

Hardware Selection for Hadoop

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj