You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by oualid ait wafli <ou...@gmail.com> on 2013/03/20 17:07:04 UTC

HBase or Cassandra

Hi,

Which is the best HBase or Cassandra ?
Which are the criteria to compare those tools( HBase and Cassandra)

Thanks

Re: HBase or Cassandra

Posted by Mohammad Tariq <do...@gmail.com>.
Harsh has got a point. You should consider it. If you really need random
real time read/write, only then you should go for a DB.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 3:29 PM, Nitin Pawar <ni...@gmail.com>wrote:

> Oozie is a workflow scheduling and processing engine.
>
> so suppose you have similar kind of incoming data and you want to do a
> bunch of data processing steps on this data as and when it arrives, oozie
> will give you the framework for same
>
>
> On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Thanks Mohammad,
>> but how can I use Oozie !
>>
>>
>> 2013/3/21 Mohammad Tariq <do...@gmail.com>
>>
>>> Hello there,
>>>
>>>   For your use case, Hbase seems to be a better choice. And you workflow
>>> looks good to me.
>>>
>>> Just one suggestion(in case you find it useful). Since, you are going to
>>> do a lot of operations,
>>> you might find it useful to schedule the jobs using Oozie.
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> I have the CDR files (call details record) as my data and I want read
>>>> from those files the data using Pig.
>>>>
>>>> firstly, I will import the data from sources using Flume, then use Pig
>>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>>
>>>>  My questions:
>>>> - How do you find my idea to analyze, process my data ? Am I in the
>>>> best way ?
>>>> - which one is the best HBase or Cassandra ?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>>> Can you give us more information about your use case ?
>>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>>> those files and store them
>>>>>> any idea ?
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>>>
>>>>>>> The answer to second question would be subjective.
>>>>>>>
>>>>>>> Do you have specific use case in mind ?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: HBase or Cassandra

Posted by Mohammad Tariq <do...@gmail.com>.
Harsh has got a point. You should consider it. If you really need random
real time read/write, only then you should go for a DB.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 3:29 PM, Nitin Pawar <ni...@gmail.com>wrote:

> Oozie is a workflow scheduling and processing engine.
>
> so suppose you have similar kind of incoming data and you want to do a
> bunch of data processing steps on this data as and when it arrives, oozie
> will give you the framework for same
>
>
> On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Thanks Mohammad,
>> but how can I use Oozie !
>>
>>
>> 2013/3/21 Mohammad Tariq <do...@gmail.com>
>>
>>> Hello there,
>>>
>>>   For your use case, Hbase seems to be a better choice. And you workflow
>>> looks good to me.
>>>
>>> Just one suggestion(in case you find it useful). Since, you are going to
>>> do a lot of operations,
>>> you might find it useful to schedule the jobs using Oozie.
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> I have the CDR files (call details record) as my data and I want read
>>>> from those files the data using Pig.
>>>>
>>>> firstly, I will import the data from sources using Flume, then use Pig
>>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>>
>>>>  My questions:
>>>> - How do you find my idea to analyze, process my data ? Am I in the
>>>> best way ?
>>>> - which one is the best HBase or Cassandra ?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>>> Can you give us more information about your use case ?
>>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>>> those files and store them
>>>>>> any idea ?
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>>>
>>>>>>> The answer to second question would be subjective.
>>>>>>>
>>>>>>> Do you have specific use case in mind ?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: HBase or Cassandra

Posted by Mohammad Tariq <do...@gmail.com>.
Harsh has got a point. You should consider it. If you really need random
real time read/write, only then you should go for a DB.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 3:29 PM, Nitin Pawar <ni...@gmail.com>wrote:

> Oozie is a workflow scheduling and processing engine.
>
> so suppose you have similar kind of incoming data and you want to do a
> bunch of data processing steps on this data as and when it arrives, oozie
> will give you the framework for same
>
>
> On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Thanks Mohammad,
>> but how can I use Oozie !
>>
>>
>> 2013/3/21 Mohammad Tariq <do...@gmail.com>
>>
>>> Hello there,
>>>
>>>   For your use case, Hbase seems to be a better choice. And you workflow
>>> looks good to me.
>>>
>>> Just one suggestion(in case you find it useful). Since, you are going to
>>> do a lot of operations,
>>> you might find it useful to schedule the jobs using Oozie.
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> I have the CDR files (call details record) as my data and I want read
>>>> from those files the data using Pig.
>>>>
>>>> firstly, I will import the data from sources using Flume, then use Pig
>>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>>
>>>>  My questions:
>>>> - How do you find my idea to analyze, process my data ? Am I in the
>>>> best way ?
>>>> - which one is the best HBase or Cassandra ?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>>> Can you give us more information about your use case ?
>>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>>> those files and store them
>>>>>> any idea ?
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>>>
>>>>>>> The answer to second question would be subjective.
>>>>>>>
>>>>>>> Do you have specific use case in mind ?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: HBase or Cassandra

Posted by Mohammad Tariq <do...@gmail.com>.
Harsh has got a point. You should consider it. If you really need random
real time read/write, only then you should go for a DB.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 3:29 PM, Nitin Pawar <ni...@gmail.com>wrote:

> Oozie is a workflow scheduling and processing engine.
>
> so suppose you have similar kind of incoming data and you want to do a
> bunch of data processing steps on this data as and when it arrives, oozie
> will give you the framework for same
>
>
> On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Thanks Mohammad,
>> but how can I use Oozie !
>>
>>
>> 2013/3/21 Mohammad Tariq <do...@gmail.com>
>>
>>> Hello there,
>>>
>>>   For your use case, Hbase seems to be a better choice. And you workflow
>>> looks good to me.
>>>
>>> Just one suggestion(in case you find it useful). Since, you are going to
>>> do a lot of operations,
>>> you might find it useful to schedule the jobs using Oozie.
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> I have the CDR files (call details record) as my data and I want read
>>>> from those files the data using Pig.
>>>>
>>>> firstly, I will import the data from sources using Flume, then use Pig
>>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>>
>>>>  My questions:
>>>> - How do you find my idea to analyze, process my data ? Am I in the
>>>> best way ?
>>>> - which one is the best HBase or Cassandra ?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>>> Can you give us more information about your use case ?
>>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>>> those files and store them
>>>>>> any idea ?
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>>>
>>>>>>> The answer to second question would be subjective.
>>>>>>>
>>>>>>> Do you have specific use case in mind ?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: HBase or Cassandra

Posted by Nitin Pawar <ni...@gmail.com>.
Oozie is a workflow scheduling and processing engine.

so suppose you have similar kind of incoming data and you want to do a
bunch of data processing steps on this data as and when it arrives, oozie
will give you the framework for same


On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Thanks Mohammad,
> but how can I use Oozie !
>
>
> 2013/3/21 Mohammad Tariq <do...@gmail.com>
>
>> Hello there,
>>
>>   For your use case, Hbase seems to be a better choice. And you workflow
>> looks good to me.
>>
>> Just one suggestion(in case you find it useful). Since, you are going to
>> do a lot of operations,
>> you might find it useful to schedule the jobs using Oozie.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> I have the CDR files (call details record) as my data and I want read
>>> from those files the data using Pig.
>>>
>>> firstly, I will import the data from sources using Flume, then use Pig
>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>
>>>  My questions:
>>> - How do you find my idea to analyze, process my data ? Am I in the best
>>> way ?
>>> - which one is the best HBase or Cassandra ?
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>
>>>> Can you give us more information about your use case ?
>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>> oualid.aitwafli@gmail.com> wrote:
>>>>
>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>> those files and store them
>>>>> any idea ?
>>>>> thanks
>>>>>
>>>>>
>>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>>
>>>>>> The answer to second question would be subjective.
>>>>>>
>>>>>> Do you have specific use case in mind ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Re: HBase or Cassandra

Posted by Nitin Pawar <ni...@gmail.com>.
Oozie is a workflow scheduling and processing engine.

so suppose you have similar kind of incoming data and you want to do a
bunch of data processing steps on this data as and when it arrives, oozie
will give you the framework for same


On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Thanks Mohammad,
> but how can I use Oozie !
>
>
> 2013/3/21 Mohammad Tariq <do...@gmail.com>
>
>> Hello there,
>>
>>   For your use case, Hbase seems to be a better choice. And you workflow
>> looks good to me.
>>
>> Just one suggestion(in case you find it useful). Since, you are going to
>> do a lot of operations,
>> you might find it useful to schedule the jobs using Oozie.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> I have the CDR files (call details record) as my data and I want read
>>> from those files the data using Pig.
>>>
>>> firstly, I will import the data from sources using Flume, then use Pig
>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>
>>>  My questions:
>>> - How do you find my idea to analyze, process my data ? Am I in the best
>>> way ?
>>> - which one is the best HBase or Cassandra ?
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>
>>>> Can you give us more information about your use case ?
>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>> oualid.aitwafli@gmail.com> wrote:
>>>>
>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>> those files and store them
>>>>> any idea ?
>>>>> thanks
>>>>>
>>>>>
>>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>>
>>>>>> The answer to second question would be subjective.
>>>>>>
>>>>>> Do you have specific use case in mind ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Re: HBase or Cassandra

Posted by Nitin Pawar <ni...@gmail.com>.
Oozie is a workflow scheduling and processing engine.

so suppose you have similar kind of incoming data and you want to do a
bunch of data processing steps on this data as and when it arrives, oozie
will give you the framework for same


On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Thanks Mohammad,
> but how can I use Oozie !
>
>
> 2013/3/21 Mohammad Tariq <do...@gmail.com>
>
>> Hello there,
>>
>>   For your use case, Hbase seems to be a better choice. And you workflow
>> looks good to me.
>>
>> Just one suggestion(in case you find it useful). Since, you are going to
>> do a lot of operations,
>> you might find it useful to schedule the jobs using Oozie.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> I have the CDR files (call details record) as my data and I want read
>>> from those files the data using Pig.
>>>
>>> firstly, I will import the data from sources using Flume, then use Pig
>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>
>>>  My questions:
>>> - How do you find my idea to analyze, process my data ? Am I in the best
>>> way ?
>>> - which one is the best HBase or Cassandra ?
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>
>>>> Can you give us more information about your use case ?
>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>> oualid.aitwafli@gmail.com> wrote:
>>>>
>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>> those files and store them
>>>>> any idea ?
>>>>> thanks
>>>>>
>>>>>
>>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>>
>>>>>> The answer to second question would be subjective.
>>>>>>
>>>>>> Do you have specific use case in mind ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Re: HBase or Cassandra

Posted by Nitin Pawar <ni...@gmail.com>.
Oozie is a workflow scheduling and processing engine.

so suppose you have similar kind of incoming data and you want to do a
bunch of data processing steps on this data as and when it arrives, oozie
will give you the framework for same


On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Thanks Mohammad,
> but how can I use Oozie !
>
>
> 2013/3/21 Mohammad Tariq <do...@gmail.com>
>
>> Hello there,
>>
>>   For your use case, Hbase seems to be a better choice. And you workflow
>> looks good to me.
>>
>> Just one suggestion(in case you find it useful). Since, you are going to
>> do a lot of operations,
>> you might find it useful to schedule the jobs using Oozie.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> I have the CDR files (call details record) as my data and I want read
>>> from those files the data using Pig.
>>>
>>> firstly, I will import the data from sources using Flume, then use Pig
>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>
>>>  My questions:
>>> - How do you find my idea to analyze, process my data ? Am I in the best
>>> way ?
>>> - which one is the best HBase or Cassandra ?
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>
>>>> Can you give us more information about your use case ?
>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>> oualid.aitwafli@gmail.com> wrote:
>>>>
>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>> those files and store them
>>>>> any idea ?
>>>>> thanks
>>>>>
>>>>>
>>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>>
>>>>>> The answer to second question would be subjective.
>>>>>>
>>>>>> Do you have specific use case in mind ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
Thanks Mohammad,
but how can I use Oozie !


2013/3/21 Mohammad Tariq <do...@gmail.com>

> Hello there,
>
>   For your use case, Hbase seems to be a better choice. And you workflow
> looks good to me.
>
> Just one suggestion(in case you find it useful). Since, you are going to
> do a lot of operations,
> you might find it useful to schedule the jobs using Oozie.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> I have the CDR files (call details record) as my data and I want read
>> from those files the data using Pig.
>>
>> firstly, I will import the data from sources using Flume, then use Pig as
>> an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
>> my data but I have to do a benchmark between HBase and Cassandra.
>>
>>  My questions:
>> - How do you find my idea to analyze, process my data ? Am I in the best
>> way ?
>> - which one is the best HBase or Cassandra ?
>>
>>
>> Thanks
>>
>>
>>
>>
>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>>> Can you give us more information about your use case ?
>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>
>>> Cheers
>>>
>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> Yes I have a data source which contains log files, I want to analyze
>>>> those files and store them
>>>> any idea ?
>>>> thanks
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>>> The answer to second question would be subjective.
>>>>>
>>>>> Do you have specific use case in mind ?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Which is the best HBase or Cassandra ?
>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
Thanks Mohammad,
but how can I use Oozie !


2013/3/21 Mohammad Tariq <do...@gmail.com>

> Hello there,
>
>   For your use case, Hbase seems to be a better choice. And you workflow
> looks good to me.
>
> Just one suggestion(in case you find it useful). Since, you are going to
> do a lot of operations,
> you might find it useful to schedule the jobs using Oozie.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> I have the CDR files (call details record) as my data and I want read
>> from those files the data using Pig.
>>
>> firstly, I will import the data from sources using Flume, then use Pig as
>> an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
>> my data but I have to do a benchmark between HBase and Cassandra.
>>
>>  My questions:
>> - How do you find my idea to analyze, process my data ? Am I in the best
>> way ?
>> - which one is the best HBase or Cassandra ?
>>
>>
>> Thanks
>>
>>
>>
>>
>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>>> Can you give us more information about your use case ?
>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>
>>> Cheers
>>>
>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> Yes I have a data source which contains log files, I want to analyze
>>>> those files and store them
>>>> any idea ?
>>>> thanks
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>>> The answer to second question would be subjective.
>>>>>
>>>>> Do you have specific use case in mind ?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Which is the best HBase or Cassandra ?
>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
Thanks Mohammad,
but how can I use Oozie !


2013/3/21 Mohammad Tariq <do...@gmail.com>

> Hello there,
>
>   For your use case, Hbase seems to be a better choice. And you workflow
> looks good to me.
>
> Just one suggestion(in case you find it useful). Since, you are going to
> do a lot of operations,
> you might find it useful to schedule the jobs using Oozie.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> I have the CDR files (call details record) as my data and I want read
>> from those files the data using Pig.
>>
>> firstly, I will import the data from sources using Flume, then use Pig as
>> an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
>> my data but I have to do a benchmark between HBase and Cassandra.
>>
>>  My questions:
>> - How do you find my idea to analyze, process my data ? Am I in the best
>> way ?
>> - which one is the best HBase or Cassandra ?
>>
>>
>> Thanks
>>
>>
>>
>>
>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>>> Can you give us more information about your use case ?
>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>
>>> Cheers
>>>
>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> Yes I have a data source which contains log files, I want to analyze
>>>> those files and store them
>>>> any idea ?
>>>> thanks
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>>> The answer to second question would be subjective.
>>>>>
>>>>> Do you have specific use case in mind ?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Which is the best HBase or Cassandra ?
>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
Thanks Mohammad,
but how can I use Oozie !


2013/3/21 Mohammad Tariq <do...@gmail.com>

> Hello there,
>
>   For your use case, Hbase seems to be a better choice. And you workflow
> looks good to me.
>
> Just one suggestion(in case you find it useful). Since, you are going to
> do a lot of operations,
> you might find it useful to schedule the jobs using Oozie.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> I have the CDR files (call details record) as my data and I want read
>> from those files the data using Pig.
>>
>> firstly, I will import the data from sources using Flume, then use Pig as
>> an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
>> my data but I have to do a benchmark between HBase and Cassandra.
>>
>>  My questions:
>> - How do you find my idea to analyze, process my data ? Am I in the best
>> way ?
>> - which one is the best HBase or Cassandra ?
>>
>>
>> Thanks
>>
>>
>>
>>
>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>>> Can you give us more information about your use case ?
>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>
>>> Cheers
>>>
>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> Yes I have a data source which contains log files, I want to analyze
>>>> those files and store them
>>>> any idea ?
>>>> thanks
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>>> The answer to second question would be subjective.
>>>>>
>>>>> Do you have specific use case in mind ?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Which is the best HBase or Cassandra ?
>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by Mohammad Tariq <do...@gmail.com>.
Hello there,

  For your use case, Hbase seems to be a better choice. And you workflow
looks good to me.

Just one suggestion(in case you find it useful). Since, you are going to do
a lot of operations,
you might find it useful to schedule the jobs using Oozie.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as
> an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
> my data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best
> way ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>> oualid.aitwafli@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by Harsh J <ha...@cloudera.com>.
If your use case is merely to process these files in batch, then why
use HBase and/or Cassandra? What you've described seems to already
address the need.

I do not know too much about Cassandra so I will refrain commenting on
it, however the following may apply to it as well. HBase is useful for
more "realtime" lookup needs - like if you have a need to do random
reads and writes in realtime, such as looking up a specific customer's
records for a specific date without querying the whole dataset. Or
editing the a customer's balance, perhaps, without rewriting the whole
data. The operation requirements also includes deletes of specific
records, etc. and support for unstructured data R/W. HBase is not
something used for processing data, just for storing and retrieving it
and optionally managing the storage/retrieval at a per-record level.

If your needs do not involve random reads/writes and your operation is
batch oriented then neither Cassandra nor HBase would give you the
speeds of raw HDFS files based MR running the logic on the input files.

On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli
<ou...@gmail.com> wrote:
> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as an
> ETL and as a tool to run MapReduce jobs into HDFS. so now I want store my
> data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best way
> ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli
>> <ou...@gmail.com> wrote:
>>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli
>>>> <ou...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>
>>>>
>>>
>>
>



--
Harsh J

Re: HBase or Cassandra

Posted by Mohammad Tariq <do...@gmail.com>.
Hello there,

  For your use case, Hbase seems to be a better choice. And you workflow
looks good to me.

Just one suggestion(in case you find it useful). Since, you are going to do
a lot of operations,
you might find it useful to schedule the jobs using Oozie.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as
> an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
> my data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best
> way ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>> oualid.aitwafli@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by Harsh J <ha...@cloudera.com>.
If your use case is merely to process these files in batch, then why
use HBase and/or Cassandra? What you've described seems to already
address the need.

I do not know too much about Cassandra so I will refrain commenting on
it, however the following may apply to it as well. HBase is useful for
more "realtime" lookup needs - like if you have a need to do random
reads and writes in realtime, such as looking up a specific customer's
records for a specific date without querying the whole dataset. Or
editing the a customer's balance, perhaps, without rewriting the whole
data. The operation requirements also includes deletes of specific
records, etc. and support for unstructured data R/W. HBase is not
something used for processing data, just for storing and retrieving it
and optionally managing the storage/retrieval at a per-record level.

If your needs do not involve random reads/writes and your operation is
batch oriented then neither Cassandra nor HBase would give you the
speeds of raw HDFS files based MR running the logic on the input files.

On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli
<ou...@gmail.com> wrote:
> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as an
> ETL and as a tool to run MapReduce jobs into HDFS. so now I want store my
> data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best way
> ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli
>> <ou...@gmail.com> wrote:
>>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli
>>>> <ou...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>
>>>>
>>>
>>
>



--
Harsh J

Re: HBase or Cassandra

Posted by Mohammad Tariq <do...@gmail.com>.
Hello there,

  For your use case, Hbase seems to be a better choice. And you workflow
looks good to me.

Just one suggestion(in case you find it useful). Since, you are going to do
a lot of operations,
you might find it useful to schedule the jobs using Oozie.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as
> an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
> my data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best
> way ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>> oualid.aitwafli@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by Mohammad Tariq <do...@gmail.com>.
Hello there,

  For your use case, Hbase seems to be a better choice. And you workflow
looks good to me.

Just one suggestion(in case you find it useful). Since, you are going to do
a lot of operations,
you might find it useful to schedule the jobs using Oozie.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as
> an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
> my data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best
> way ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>> oualid.aitwafli@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by Harsh J <ha...@cloudera.com>.
If your use case is merely to process these files in batch, then why
use HBase and/or Cassandra? What you've described seems to already
address the need.

I do not know too much about Cassandra so I will refrain commenting on
it, however the following may apply to it as well. HBase is useful for
more "realtime" lookup needs - like if you have a need to do random
reads and writes in realtime, such as looking up a specific customer's
records for a specific date without querying the whole dataset. Or
editing the a customer's balance, perhaps, without rewriting the whole
data. The operation requirements also includes deletes of specific
records, etc. and support for unstructured data R/W. HBase is not
something used for processing data, just for storing and retrieving it
and optionally managing the storage/retrieval at a per-record level.

If your needs do not involve random reads/writes and your operation is
batch oriented then neither Cassandra nor HBase would give you the
speeds of raw HDFS files based MR running the logic on the input files.

On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli
<ou...@gmail.com> wrote:
> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as an
> ETL and as a tool to run MapReduce jobs into HDFS. so now I want store my
> data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best way
> ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli
>> <ou...@gmail.com> wrote:
>>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli
>>>> <ou...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>
>>>>
>>>
>>
>



--
Harsh J

Re: HBase or Cassandra

Posted by Harsh J <ha...@cloudera.com>.
If your use case is merely to process these files in batch, then why
use HBase and/or Cassandra? What you've described seems to already
address the need.

I do not know too much about Cassandra so I will refrain commenting on
it, however the following may apply to it as well. HBase is useful for
more "realtime" lookup needs - like if you have a need to do random
reads and writes in realtime, such as looking up a specific customer's
records for a specific date without querying the whole dataset. Or
editing the a customer's balance, perhaps, without rewriting the whole
data. The operation requirements also includes deletes of specific
records, etc. and support for unstructured data R/W. HBase is not
something used for processing data, just for storing and retrieving it
and optionally managing the storage/retrieval at a per-record level.

If your needs do not involve random reads/writes and your operation is
batch oriented then neither Cassandra nor HBase would give you the
speeds of raw HDFS files based MR running the logic on the input files.

On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli
<ou...@gmail.com> wrote:
> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as an
> ETL and as a tool to run MapReduce jobs into HDFS. so now I want store my
> data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best way
> ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli
>> <ou...@gmail.com> wrote:
>>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli
>>>> <ou...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>
>>>>
>>>
>>
>



--
Harsh J

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
I have the CDR files (call details record) as my data and I want read from
those files the data using Pig.

firstly, I will import the data from sources using Flume, then use Pig as
an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
my data but I have to do a benchmark between HBase and Cassandra.

 My questions:
- How do you find my idea to analyze, process my data ? Am I in the best
way ?
- which one is the best HBase or Cassandra ?


Thanks




2013/3/20 Ted Yu <yu...@gmail.com>

> Can you give us more information about your use case ?
> e.g. approximate ratio between write vs. read load, amount of log, etc.
>
> Cheers
>
> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Yes I have a data source which contains log files, I want to analyze
>> those files and store them
>> any idea ?
>> thanks
>>
>>
>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>>> The answer to second question would be subjective.
>>>
>>> Do you have specific use case in mind ?
>>>
>>> Thanks
>>>
>>>
>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Which is the best HBase or Cassandra ?
>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
I have the CDR files (call details record) as my data and I want read from
those files the data using Pig.

firstly, I will import the data from sources using Flume, then use Pig as
an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
my data but I have to do a benchmark between HBase and Cassandra.

 My questions:
- How do you find my idea to analyze, process my data ? Am I in the best
way ?
- which one is the best HBase or Cassandra ?


Thanks




2013/3/20 Ted Yu <yu...@gmail.com>

> Can you give us more information about your use case ?
> e.g. approximate ratio between write vs. read load, amount of log, etc.
>
> Cheers
>
> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Yes I have a data source which contains log files, I want to analyze
>> those files and store them
>> any idea ?
>> thanks
>>
>>
>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>>> The answer to second question would be subjective.
>>>
>>> Do you have specific use case in mind ?
>>>
>>> Thanks
>>>
>>>
>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Which is the best HBase or Cassandra ?
>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
I have the CDR files (call details record) as my data and I want read from
those files the data using Pig.

firstly, I will import the data from sources using Flume, then use Pig as
an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
my data but I have to do a benchmark between HBase and Cassandra.

 My questions:
- How do you find my idea to analyze, process my data ? Am I in the best
way ?
- which one is the best HBase or Cassandra ?


Thanks




2013/3/20 Ted Yu <yu...@gmail.com>

> Can you give us more information about your use case ?
> e.g. approximate ratio between write vs. read load, amount of log, etc.
>
> Cheers
>
> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Yes I have a data source which contains log files, I want to analyze
>> those files and store them
>> any idea ?
>> thanks
>>
>>
>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>>> The answer to second question would be subjective.
>>>
>>> Do you have specific use case in mind ?
>>>
>>> Thanks
>>>
>>>
>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Which is the best HBase or Cassandra ?
>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
I have the CDR files (call details record) as my data and I want read from
those files the data using Pig.

firstly, I will import the data from sources using Flume, then use Pig as
an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store
my data but I have to do a benchmark between HBase and Cassandra.

 My questions:
- How do you find my idea to analyze, process my data ? Am I in the best
way ?
- which one is the best HBase or Cassandra ?


Thanks




2013/3/20 Ted Yu <yu...@gmail.com>

> Can you give us more information about your use case ?
> e.g. approximate ratio between write vs. read load, amount of log, etc.
>
> Cheers
>
> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Yes I have a data source which contains log files, I want to analyze
>> those files and store them
>> any idea ?
>> thanks
>>
>>
>> 2013/3/20 Ted Yu <yu...@gmail.com>
>>
>>> The answer to second question would be subjective.
>>>
>>> Do you have specific use case in mind ?
>>>
>>> Thanks
>>>
>>>
>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Which is the best HBase or Cassandra ?
>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>

Re: HBase or Cassandra

Posted by Ted Yu <yu...@gmail.com>.
Can you give us more information about your use case ?
e.g. approximate ratio between write vs. read load, amount of log, etc.

Cheers

On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Yes I have a data source which contains log files, I want to analyze those
> files and store them
> any idea ?
> thanks
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>
>> The answer to second question would be subjective.
>>
>> Do you have specific use case in mind ?
>>
>> Thanks
>>
>>
>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Which is the best HBase or Cassandra ?
>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>
>>> Thanks
>>>
>>
>>
>

Re: HBase or Cassandra

Posted by Ted Yu <yu...@gmail.com>.
Can you give us more information about your use case ?
e.g. approximate ratio between write vs. read load, amount of log, etc.

Cheers

On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Yes I have a data source which contains log files, I want to analyze those
> files and store them
> any idea ?
> thanks
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>
>> The answer to second question would be subjective.
>>
>> Do you have specific use case in mind ?
>>
>> Thanks
>>
>>
>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Which is the best HBase or Cassandra ?
>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>
>>> Thanks
>>>
>>
>>
>

Re: HBase or Cassandra

Posted by Ted Yu <yu...@gmail.com>.
Can you give us more information about your use case ?
e.g. approximate ratio between write vs. read load, amount of log, etc.

Cheers

On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Yes I have a data source which contains log files, I want to analyze those
> files and store them
> any idea ?
> thanks
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>
>> The answer to second question would be subjective.
>>
>> Do you have specific use case in mind ?
>>
>> Thanks
>>
>>
>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Which is the best HBase or Cassandra ?
>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>
>>> Thanks
>>>
>>
>>
>

Re: HBase or Cassandra

Posted by Ted Yu <yu...@gmail.com>.
Can you give us more information about your use case ?
e.g. approximate ratio between write vs. read load, amount of log, etc.

Cheers

On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Yes I have a data source which contains log files, I want to analyze those
> files and store them
> any idea ?
> thanks
>
>
> 2013/3/20 Ted Yu <yu...@gmail.com>
>
>> The answer to second question would be subjective.
>>
>> Do you have specific use case in mind ?
>>
>> Thanks
>>
>>
>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>> oualid.aitwafli@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Which is the best HBase or Cassandra ?
>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>
>>> Thanks
>>>
>>
>>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
Yes I have a data source which contains log files, I want to analyze those
files and store them
any idea ?
thanks


2013/3/20 Ted Yu <yu...@gmail.com>

> The answer to second question would be subjective.
>
> Do you have specific use case in mind ?
>
> Thanks
>
>
> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Hi,
>>
>> Which is the best HBase or Cassandra ?
>> Which are the criteria to compare those tools( HBase and Cassandra)
>>
>> Thanks
>>
>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
Yes I have a data source which contains log files, I want to analyze those
files and store them
any idea ?
thanks


2013/3/20 Ted Yu <yu...@gmail.com>

> The answer to second question would be subjective.
>
> Do you have specific use case in mind ?
>
> Thanks
>
>
> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Hi,
>>
>> Which is the best HBase or Cassandra ?
>> Which are the criteria to compare those tools( HBase and Cassandra)
>>
>> Thanks
>>
>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
Yes I have a data source which contains log files, I want to analyze those
files and store them
any idea ?
thanks


2013/3/20 Ted Yu <yu...@gmail.com>

> The answer to second question would be subjective.
>
> Do you have specific use case in mind ?
>
> Thanks
>
>
> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Hi,
>>
>> Which is the best HBase or Cassandra ?
>> Which are the criteria to compare those tools( HBase and Cassandra)
>>
>> Thanks
>>
>
>

Re: HBase or Cassandra

Posted by oualid ait wafli <ou...@gmail.com>.
Yes I have a data source which contains log files, I want to analyze those
files and store them
any idea ?
thanks


2013/3/20 Ted Yu <yu...@gmail.com>

> The answer to second question would be subjective.
>
> Do you have specific use case in mind ?
>
> Thanks
>
>
> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Hi,
>>
>> Which is the best HBase or Cassandra ?
>> Which are the criteria to compare those tools( HBase and Cassandra)
>>
>> Thanks
>>
>
>

Re: HBase or Cassandra

Posted by Ted Yu <yu...@gmail.com>.
The answer to second question would be subjective.

Do you have specific use case in mind ?

Thanks

On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Hi,
>
> Which is the best HBase or Cassandra ?
> Which are the criteria to compare those tools( HBase and Cassandra)
>
> Thanks
>

Re: HBase or Cassandra

Posted by Ted Yu <yu...@gmail.com>.
The answer to second question would be subjective.

Do you have specific use case in mind ?

Thanks

On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Hi,
>
> Which is the best HBase or Cassandra ?
> Which are the criteria to compare those tools( HBase and Cassandra)
>
> Thanks
>

Re: HBase or Cassandra

Posted by Ted Yu <yu...@gmail.com>.
The answer to second question would be subjective.

Do you have specific use case in mind ?

Thanks

On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Hi,
>
> Which is the best HBase or Cassandra ?
> Which are the criteria to compare those tools( HBase and Cassandra)
>
> Thanks
>

Re: HBase or Cassandra

Posted by Ted Yu <yu...@gmail.com>.
The answer to second question would be subjective.

Do you have specific use case in mind ?

Thanks

On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <oualid.aitwafli@gmail.com
> wrote:

> Hi,
>
> Which is the best HBase or Cassandra ?
> Which are the criteria to compare those tools( HBase and Cassandra)
>
> Thanks
>