You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Subash DSouza <su...@outlook.com> on 2014/04/09 03:07:08 UTC

Hadoop Summit EU HBase Meetup Review

We had Hadoop Summit Europe last week where we had an HBase Meetup. First we had Enis talk about HBase Architectureand then Lars talked about some interesting HBase Use CasesFinally, we opened it up to the public where we had a frank discussion on the Uptake of HBase vs. other NoSQL DB's such as Mongo and Cassandra. This wasn't about bashing other DB's, just understanding how the spectrum of NoSQL DB's was leading to a evaluation/production use of HBase. It was also partly based on the report from InfoWorldhttp://podcasts.infoworld.com/d/big-data/big-data-showdown-cassandra-vs-hbase-239592Anyways these were the major points we discussed(Lars and Jon Hsieh from Cloudera, Enis and Devaraj from Hortonworks contributed with about input from 12 other users from the community)Documentation - Cassandra has a better web page than HBase does. Even though HBase's documentation is complete, finding the documentation is a bit hard. Installation - HBase is hard to install for the newbie. I think there has been some effort to make this more friendly by wrapping the master in RegionServersVendor Pushes - Cassandra has DataStax, Pentaho pushes Mongo, Cloudera pushes Impala, MapR is pushing their proprietary FS, IBM their own DB's. Even though HBase is part of the Hadoop Ecosystem, there is no one vendor that is exclusively pushing HBase to uptake by the community or even by the Hadoop communityMessaging - HBase has been at the backend of a no. of negative marketing by various vendors over things that were possibly true in the past. For e.g. Lars mentioned that a certain vendor was incorrectly stating that HBase has issue with SPOF even though this hasn't been true for quite some time. Similarly, Jon mentioned that a certain slide where he was talking about the complexity of HBase was taken out of context and shown as a negative implementation of HBaseSQL based solutions - Even though there are a no. of efforts to showcase that HBase has some SQL based interfaces available like Phoenix, Impala & Hive(Albeit some issues), there is still misconception that HBase is purely accessed via JavaSecurity in HBase - Even though 0.98 has Security, it needs to be road tested.Some recommendations:Push messaging out and make it more clear - Apache blogs, Hortonworks Blogs, Cloudera blogsDocumentation - David Worms, who is a consultant out of France, has volunteered to help make the website better. You may want to reach out to him - fr.linkedin.com/pub/david-worms/7/626/630Cost Calculator - Lars made a great point of having a cost calculator ability to estimate the cost of various operations. This makes it very likely by bigger organizations to pick and choose HBase by understanding how they affect the bottom line



Update from Andrew - 

"HBase has had strong security since 0.94 if not 0.92 - secure RPC and ACLs at the table and column family level. We had these features before Cassandra and even Accumulo.Why stuff like that gets lost is we are a bunch of engineers not marketers. The trouble with messaging is someone has to write it. Since it's a joyless job for most engineers, someone must be paid to do it. "


Thanks
Subash
 		 	   		  

RE: Hadoop Summit EU HBase Meetup Review

Posted by Subash DSouza <su...@outlook.com>.
Yeah Sorry.

I formatted it nicely but it got lost while posting to the list

Subash

-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Tuesday, April 8, 2014 7:10 PM
To: user@hbase.apache.org; user@hbase.apache.org
Subject: RE: Hadoop Summit EU HBase Meetup Review

Very interesting write up, but quite hard to read.

configuration, installation, complexity, documentation ... these can be
improved, no doubts, some other features (Inter DC replication and high
availability) are still behind of Cassandra's I think and this should be
prioritized by HBase community. I do not count MongoDB as a real contender -
its in another league and will become niche product for quick mash-ups of
web apps very soon. The real problem of HBase, I agree - its does not have
real corporate sponsor similar to Cassandra's DataStax.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Subash DSouza [subashdsouza@outlook.com]
Sent: Tuesday, April 08, 2014 6:07 PM
To: user@hbase.apache.org
Subject: Hadoop Summit EU HBase Meetup Review

We had Hadoop Summit Europe last week where we had an HBase Meetup. First we
had Enis talk about HBase Architectureand then Lars talked about some
interesting HBase Use CasesFinally, we opened it up to the public where we
had a frank discussion on the Uptake of HBase vs. other NoSQL DB's such as
Mongo and Cassandra. This wasn't about bashing other DB's, just
understanding how the spectrum of NoSQL DB's was leading to a
evaluation/production use of HBase. It was also partly based on the report
from
InfoWorldhttp://podcasts.infoworld.com/d/big-data/big-data-showdown-cassandr
a-vs-hbase-239592Anyways these were the major points we discussed(Lars and
Jon Hsieh from Cloudera, Enis and Devaraj from Hortonworks contributed with
about input from 12 other users from the community)Documentation - Cassandra
has a better web page than HBase does. Even though HBase's documentation is
complete, finding the documentation is a bit hard. Installation - HBase is
hard to install for the newbie. I think there has been some effort to make
this more friendly by wrapping the master in RegionServersVendor Pushes -
Cassandra has DataStax, Pentaho pushes Mongo, Cloudera pushes Impala, MapR
is pushing their proprietary FS, IBM their own DB's. Even though HBase is
part of the Hadoop Ecosystem, there is no one vendor that is exclusively
pushing HBase to uptake by the community or even by the Hadoop
communityMessaging - HBase has been at the backend of a no. of negative
marketing by various vendors over things that were possibly true in the
past. For e.g. Lars mentioned that a certain vendor was incorrectly stating
that HBase has issue with SPOF even though this hasn't been true for quite
some time. Similarly, Jon mentioned that a certain slide where he was
talking about the complexity of HBase was taken out of context and shown as
a negative implementation of HBaseSQL based solutions - Even though there
are a no. of efforts to showcase that HBase has some SQL based interfaces
available like Phoenix, Impala & Hive(Albeit some issues), there is still
misconception that HBase is purely accessed via JavaSecurity in HBase - Even
though 0.98 has Security, it needs to be road tested.Some
recommendations:Push messaging out and make it more clear - Apache blogs,
Hortonworks Blogs, Cloudera blogsDocumentation - David Worms, who is a
consultant out of France, has volunteered to help make the website better.
You may want to reach out to him -
fr.linkedin.com/pub/david-worms/7/626/630Cost Calculator - Lars made a great
point of having a cost calculator ability to estimate the cost of various
operations. This makes it very likely by bigger organizations to pick and
choose HBase by understanding how they affect the bottom line



Update from Andrew -

"HBase has had strong security since 0.94 if not 0.92 - secure RPC and ACLs
at the table and column family level. We had these features before Cassandra
and even Accumulo.Why stuff like that gets lost is we are a bunch of
engineers not marketers. The trouble with messaging is someone has to write
it. Since it's a joyless job for most engineers, someone must be paid to do
it. "


Thanks
Subash

Confidentiality Notice:  The information contained in this message,
including any attachments hereto, may be confidential and is intended to be
read only by the individual or entity to whom this message is addressed. If
the reader of this message is not the intended recipient or an agent or
designee of the intended recipient, please note that any review, use,
disclosure or distribution of this message or its attachments, in any form,
is strictly prohibited.  If you have received this message in error, please
immediately notify the sender and/or Notifications@carrieriq.com and delete
or destroy any copy of this message and its attachments.

RE: Hadoop Summit EU HBase Meetup Review

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Very interesting write up, but quite hard to read.

configuration, installation, complexity, documentation ... these can be improved, no doubts, some other features (Inter DC replication and high availability) are still
behind of Cassandra's I think and this should be prioritized by HBase community. I do not count MongoDB as a real contender - its in another league and will become niche product for quick mash-ups of  web apps
very soon. The real problem of HBase, I agree - its does not have real corporate sponsor similar to Cassandra's DataStax.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Subash DSouza [subashdsouza@outlook.com]
Sent: Tuesday, April 08, 2014 6:07 PM
To: user@hbase.apache.org
Subject: Hadoop Summit EU HBase Meetup Review

We had Hadoop Summit Europe last week where we had an HBase Meetup. First we had Enis talk about HBase Architectureand then Lars talked about some interesting HBase Use CasesFinally, we opened it up to the public where we had a frank discussion on the Uptake of HBase vs. other NoSQL DB's such as Mongo and Cassandra. This wasn't about bashing other DB's, just understanding how the spectrum of NoSQL DB's was leading to a evaluation/production use of HBase. It was also partly based on the report from InfoWorldhttp://podcasts.infoworld.com/d/big-data/big-data-showdown-cassandra-vs-hbase-239592Anyways these were the major points we discussed(Lars and Jon Hsieh from Cloudera, Enis and Devaraj from Hortonworks contributed with about input from 12 other users from the community)Documentation - Cassandra has a better web page than HBase does. Even though HBase's documentation is complete, finding the documentation is a bit hard. Installation - HBase is hard to install for the newbie. I think there has been some effort to make this more friendly by wrapping the master in RegionServersVendor Pushes - Cassandra has DataStax, Pentaho pushes Mongo, Cloudera pushes Impala, MapR is pushing their proprietary FS, IBM their own DB's. Even though HBase is part of the Hadoop Ecosystem, there is no one vendor that is exclusively pushing HBase to uptake by the community or even by the Hadoop communityMessaging - HBase has been at the backend of a no. of negative marketing by various vendors over things that were possibly true in the past. For e.g. Lars mentioned that a certain vendor was incorrectly stating that HBase has issue with SPOF even though this hasn't been true for quite some time. Similarly, Jon mentioned that a certain slide where he was talking about the complexity of HBase was taken out of context and shown as a negative implementation of HBaseSQL based solutions - Even though there are a no. of efforts to showcase that HBase has some SQL based interfaces available like Phoenix, Impala & Hive(Albeit some issues), there is still misconception that HBase is purely accessed via JavaSecurity in HBase - Even though 0.98 has Security, it needs to be road tested.Some recommendations:Push messaging out and make it more clear - Apache blogs, Hortonworks Blogs, Cloudera blogsDocumentation - David Worms, who is a consultant out of France, has volunteered to help make the website better. You may want to reach out to him - fr.linkedin.com/pub/david-worms/7/626/630Cost Calculator - Lars made a great point of having a cost calculator ability to estimate the cost of various operations. This makes it very likely by bigger organizations to pick and choose HBase by understanding how they affect the bottom line



Update from Andrew -

"HBase has had strong security since 0.94 if not 0.92 - secure RPC and ACLs at the table and column family level. We had these features before Cassandra and even Accumulo.Why stuff like that gets lost is we are a bunch of engineers not marketers. The trouble with messaging is someone has to write it. Since it's a joyless job for most engineers, someone must be paid to do it. "


Thanks
Subash

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.