You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Jp Gupta <ne...@gmail.com> on 2018/02/02 05:45:26 UTC

Using Kudu to Handle Huge amount of Data

Hi,
As an existing HBase user, we handle close to 20TB of data everyday.

While we are contemplating on moving to Kudu to take advantage of the new
technology, I am yet to hear of an real industry use case where Kudu is
being to used to handle of  huge amount of data.

Looking forward to your inputs on any organisation using Kudu where data
volumes of more than 10 TB is ingested everyday.

BR,
JP

Re: Using Kudu to Handle Huge amount of Data

Posted by Todd Lipcon <to...@cloudera.com>.
Hi JP,

Answers inline...

On Thu, Feb 1, 2018 at 9:45 PM, Jp Gupta <ne...@gmail.com> wrote:

> Hi,
> As an existing HBase user, we handle close to 20TB of data everyday.
>

What does "handle" mean in this case? You are inserting 20TB of new data
each day, so that your total dataset grows by that amount? How much data do
you retain? How many nodes is your cluster? (I would guess many hundred?)


>
> While we are contemplating on moving to Kudu to take advantage of the new
> technology, I am yet to hear of an real industry use case where Kudu is
> being to used to handle of  huge amount of data.
>

If you are seeing Kudu as an "improved HBase" that isn't really accurate.
Of course there are some things we can do better than HBase, but there are
some things HBase can do better than Kudu.

As for Kudu data sizes, I am aware of some organizations storing several
hundred TB in a Kudu cluster, but I have not yet heard of a use case with
1PB+. If you are looking to run at that scale you may hit some issues, but
we are standing ready to help you overcome them. I don't see any
fundamental problems that would prevent it, and I have run some basic smoke
tests of Kudu on ~800 nodes before.


>
> Looking forward to your inputs on any organisation using Kudu where data
> volumes of more than 10 TB is ingested everyday.
>

Hope some other users can chime in.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera