You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Alec Taylor <al...@gmail.com> on 2015/01/03 08:44:59 UTC

HDFS-based database for Big and Small data?

Want to replace MongoDB with an HDFS-based database in my architecture.

Note that this is a new system, not a rewrite of an old one.

Are there any open-source "fast" read/write database built on HDFS
with a model similar to a document-store, that can hold my regular
business logic and enables an object model in Python? (E.g.: via Data
Mapper or Active Record patterns)

Thanks for all suggestions

PS: I am using the "Big data" definition from Cloudera, i.e.: any data
which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by Ted Yu <yu...@gmail.com>.
If you choose to store your data in HBase, see the following for a project
that supports accessing HBase through Python:

http://search-hadoop.com/m/DHED4syFSo1/hbase+HappyBase&subj=+announce+HappyBase+0+8+a+developer+friendly+Python+library+for+HBase

Cheers

On Fri, Jan 2, 2015 at 11:44 PM, Alec Taylor <al...@gmail.com> wrote:

> Want to replace MongoDB with an HDFS-based database in my architecture.
>
> Note that this is a new system, not a rewrite of an old one.
>
> Are there any open-source "fast" read/write database built on HDFS
> with a model similar to a document-store, that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
>
> Thanks for all suggestions
>
> PS: I am using the "Big data" definition from Cloudera, i.e.: any data
> which expands more than one machine
>

Re: HDFS-based database for Big and Small data?

Posted by Ted Yu <yu...@gmail.com>.
Can you give a few examples of why hbase is hard to use ?

Disclaimer: I work on hbase 

Cheers



> On Jan 3, 2015, at 12:28 AM, "bit1129@163.com" <bi...@163.com> wrote:
> 
> 
> Hbase, a NoSQL database of the Hadoop family, is what you says   that is built upon HDFS, but is there really strong reason that you would move from MongoDB to Hbase? Is your data really that huge?
> 
> As far as I tell, Hbase is extremely hard to use, hard to maintain as well. If I standood in your position, I would choose MongoDB with no hesitate
> 
> bit1129@163.com
>  
> From: Alec Taylor
> Date: 2015-01-03 15:44
> To: user
> Subject: HDFS-based database for Big and Small data?
> Want to replace MongoDB with an HDFS-based database in my architecture.
>  
> Note that this is a new system, not a rewrite of an old one.
>  
> Are there any open-source "fast" read/write database built on HDFS
> with a model similar to a document-store, that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
>  
> Thanks for all suggestions
>  
> PS: I am using the "Big data" definition from Cloudera, i.e.: any data
> which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by Ted Yu <yu...@gmail.com>.
Can you give a few examples of why hbase is hard to use ?

Disclaimer: I work on hbase 

Cheers



> On Jan 3, 2015, at 12:28 AM, "bit1129@163.com" <bi...@163.com> wrote:
> 
> 
> Hbase, a NoSQL database of the Hadoop family, is what you says   that is built upon HDFS, but is there really strong reason that you would move from MongoDB to Hbase? Is your data really that huge?
> 
> As far as I tell, Hbase is extremely hard to use, hard to maintain as well. If I standood in your position, I would choose MongoDB with no hesitate
> 
> bit1129@163.com
>  
> From: Alec Taylor
> Date: 2015-01-03 15:44
> To: user
> Subject: HDFS-based database for Big and Small data?
> Want to replace MongoDB with an HDFS-based database in my architecture.
>  
> Note that this is a new system, not a rewrite of an old one.
>  
> Are there any open-source "fast" read/write database built on HDFS
> with a model similar to a document-store, that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
>  
> Thanks for all suggestions
>  
> PS: I am using the "Big data" definition from Cloudera, i.e.: any data
> which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by Ted Yu <yu...@gmail.com>.
Can you give a few examples of why hbase is hard to use ?

Disclaimer: I work on hbase 

Cheers



> On Jan 3, 2015, at 12:28 AM, "bit1129@163.com" <bi...@163.com> wrote:
> 
> 
> Hbase, a NoSQL database of the Hadoop family, is what you says   that is built upon HDFS, but is there really strong reason that you would move from MongoDB to Hbase? Is your data really that huge?
> 
> As far as I tell, Hbase is extremely hard to use, hard to maintain as well. If I standood in your position, I would choose MongoDB with no hesitate
> 
> bit1129@163.com
>  
> From: Alec Taylor
> Date: 2015-01-03 15:44
> To: user
> Subject: HDFS-based database for Big and Small data?
> Want to replace MongoDB with an HDFS-based database in my architecture.
>  
> Note that this is a new system, not a rewrite of an old one.
>  
> Are there any open-source "fast" read/write database built on HDFS
> with a model similar to a document-store, that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
>  
> Thanks for all suggestions
>  
> PS: I am using the "Big data" definition from Cloudera, i.e.: any data
> which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by Ted Yu <yu...@gmail.com>.
Can you give a few examples of why hbase is hard to use ?

Disclaimer: I work on hbase 

Cheers



> On Jan 3, 2015, at 12:28 AM, "bit1129@163.com" <bi...@163.com> wrote:
> 
> 
> Hbase, a NoSQL database of the Hadoop family, is what you says   that is built upon HDFS, but is there really strong reason that you would move from MongoDB to Hbase? Is your data really that huge?
> 
> As far as I tell, Hbase is extremely hard to use, hard to maintain as well. If I standood in your position, I would choose MongoDB with no hesitate
> 
> bit1129@163.com
>  
> From: Alec Taylor
> Date: 2015-01-03 15:44
> To: user
> Subject: HDFS-based database for Big and Small data?
> Want to replace MongoDB with an HDFS-based database in my architecture.
>  
> Note that this is a new system, not a rewrite of an old one.
>  
> Are there any open-source "fast" read/write database built on HDFS
> with a model similar to a document-store, that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
>  
> Thanks for all suggestions
>  
> PS: I am using the "Big data" definition from Cloudera, i.e.: any data
> which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by "bit1129@163.com" <bi...@163.com>.
Hbase, a NoSQL database of the Hadoop family, is what you says   that is built upon HDFS, but is there really strong reason that you would move from MongoDB to Hbase? Is your data really that huge?

As far as I tell, Hbase is extremely hard to use, hard to maintain as well. If I standood in your position, I would choose MongoDB with no hesitate



bit1129@163.com
 
From: Alec Taylor
Date: 2015-01-03 15:44
To: user
Subject: HDFS-based database for Big and Small data?
Want to replace MongoDB with an HDFS-based database in my architecture.
 
Note that this is a new system, not a rewrite of an old one.
 
Are there any open-source "fast" read/write database built on HDFS
with a model similar to a document-store, that can hold my regular
business logic and enables an object model in Python? (E.g.: via Data
Mapper or Active Record patterns)
 
Thanks for all suggestions
 
PS: I am using the "Big data" definition from Cloudera, i.e.: any data
which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by Alec Taylor <al...@gmail.com>.
Thanks Jay, Phoenix looks interesting.

As for document store, the main feature I want from that are:
- Multiple indexes, but not inefficiently setup (e.g.: I could use
Redis for this, but it would be very inefficient)
- Expandable schema (oh, looks like Phoenix supports this:
http://phoenix.apache.org/dynamic_columns.html)

Unsure about Solr, will investigate further. AFAIK it's an
ElasticSearch equivalent (well, the other way around!). I have
experience with that, and as long as everything is stored in HDFS the
right way, it'll be reliable.

<keeps thread open to facilitate further discussion>

On Sun, Jan 4, 2015 at 3:15 AM, Wilm Schumacher
<wi...@gmail.com> wrote:
> Am 03.01.2015 um 16:50 schrieb Jay Vyas:
>> 1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.
>>
>> 2) SolrCloud also might fit the bill here ?
>>
>> Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.
>>
>> Although the querying is limited...
> I agree of course. But perhaps the additional layer of complexity isn't
> necessary. Depends on the requirements.
>
> Best wishes,
>
> Wilm

Re: HDFS-based database for Big and Small data?

Posted by Alec Taylor <al...@gmail.com>.
Thanks Jay, Phoenix looks interesting.

As for document store, the main feature I want from that are:
- Multiple indexes, but not inefficiently setup (e.g.: I could use
Redis for this, but it would be very inefficient)
- Expandable schema (oh, looks like Phoenix supports this:
http://phoenix.apache.org/dynamic_columns.html)

Unsure about Solr, will investigate further. AFAIK it's an
ElasticSearch equivalent (well, the other way around!). I have
experience with that, and as long as everything is stored in HDFS the
right way, it'll be reliable.

<keeps thread open to facilitate further discussion>

On Sun, Jan 4, 2015 at 3:15 AM, Wilm Schumacher
<wi...@gmail.com> wrote:
> Am 03.01.2015 um 16:50 schrieb Jay Vyas:
>> 1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.
>>
>> 2) SolrCloud also might fit the bill here ?
>>
>> Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.
>>
>> Although the querying is limited...
> I agree of course. But perhaps the additional layer of complexity isn't
> necessary. Depends on the requirements.
>
> Best wishes,
>
> Wilm

Re: HDFS-based database for Big and Small data?

Posted by Alec Taylor <al...@gmail.com>.
Thanks Jay, Phoenix looks interesting.

As for document store, the main feature I want from that are:
- Multiple indexes, but not inefficiently setup (e.g.: I could use
Redis for this, but it would be very inefficient)
- Expandable schema (oh, looks like Phoenix supports this:
http://phoenix.apache.org/dynamic_columns.html)

Unsure about Solr, will investigate further. AFAIK it's an
ElasticSearch equivalent (well, the other way around!). I have
experience with that, and as long as everything is stored in HDFS the
right way, it'll be reliable.

<keeps thread open to facilitate further discussion>

On Sun, Jan 4, 2015 at 3:15 AM, Wilm Schumacher
<wi...@gmail.com> wrote:
> Am 03.01.2015 um 16:50 schrieb Jay Vyas:
>> 1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.
>>
>> 2) SolrCloud also might fit the bill here ?
>>
>> Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.
>>
>> Although the querying is limited...
> I agree of course. But perhaps the additional layer of complexity isn't
> necessary. Depends on the requirements.
>
> Best wishes,
>
> Wilm

Re: HDFS-based database for Big and Small data?

Posted by Alec Taylor <al...@gmail.com>.
Thanks Jay, Phoenix looks interesting.

As for document store, the main feature I want from that are:
- Multiple indexes, but not inefficiently setup (e.g.: I could use
Redis for this, but it would be very inefficient)
- Expandable schema (oh, looks like Phoenix supports this:
http://phoenix.apache.org/dynamic_columns.html)

Unsure about Solr, will investigate further. AFAIK it's an
ElasticSearch equivalent (well, the other way around!). I have
experience with that, and as long as everything is stored in HDFS the
right way, it'll be reliable.

<keeps thread open to facilitate further discussion>

On Sun, Jan 4, 2015 at 3:15 AM, Wilm Schumacher
<wi...@gmail.com> wrote:
> Am 03.01.2015 um 16:50 schrieb Jay Vyas:
>> 1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.
>>
>> 2) SolrCloud also might fit the bill here ?
>>
>> Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.
>>
>> Although the querying is limited...
> I agree of course. But perhaps the additional layer of complexity isn't
> necessary. Depends on the requirements.
>
> Best wishes,
>
> Wilm

Re: HDFS-based database for Big and Small data?

Posted by Wilm Schumacher <wi...@gmail.com>.
Am 03.01.2015 um 16:50 schrieb Jay Vyas:
> 1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.
>
> 2) SolrCloud also might fit the bill here ? 
>
> Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.
>
> Although the querying is limited...
I agree of course. But perhaps the additional layer of complexity isn't
necessary. Depends on the requirements.

Best wishes,

Wilm

Re: HDFS-based database for Big and Small data?

Posted by Wilm Schumacher <wi...@gmail.com>.
Am 03.01.2015 um 16:50 schrieb Jay Vyas:
> 1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.
>
> 2) SolrCloud also might fit the bill here ? 
>
> Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.
>
> Although the querying is limited...
I agree of course. But perhaps the additional layer of complexity isn't
necessary. Depends on the requirements.

Best wishes,

Wilm

Re: HDFS-based database for Big and Small data?

Posted by Wilm Schumacher <wi...@gmail.com>.
Am 03.01.2015 um 16:50 schrieb Jay Vyas:
> 1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.
>
> 2) SolrCloud also might fit the bill here ? 
>
> Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.
>
> Although the querying is limited...
I agree of course. But perhaps the additional layer of complexity isn't
necessary. Depends on the requirements.

Best wishes,

Wilm

Re: HDFS-based database for Big and Small data?

Posted by Wilm Schumacher <wi...@gmail.com>.
Am 03.01.2015 um 16:50 schrieb Jay Vyas:
> 1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.
>
> 2) SolrCloud also might fit the bill here ? 
>
> Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.
>
> Although the querying is limited...
I agree of course. But perhaps the additional layer of complexity isn't
necessary. Depends on the requirements.

Best wishes,

Wilm

Re: HDFS-based database for Big and Small data?

Posted by Jay Vyas <ja...@gmail.com>.
1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.

2) SolrCloud also might fit the bill here ? 

Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.

Although the querying is limited...



> On Jan 3, 2015, at 9:39 AM, Wilm Schumacher <wi...@gmail.com> wrote:
> 
>> Am 03.01.2015 um 08:44 schrieb Alec Taylor:
>> Want to replace MongoDB with an HDFS-based database in my architecture.
>> 
>> Note that this is a new system, not a rewrite of an old one.
>> 
>> Are there any open-source "fast" read/write database built on HDFS
> yeah. As Ted wrote: hbase.
> 
>> with a model similar to a document-store,
> well, then PERHAPS hbase isn't the right choice. What exactly do you
> need from the definition of a "doc-store"? If you e.g. rely highly on ad
> hoc queries or secondary indexes then perhaps hbase could lead to some
> additional work for you.
> 
>> that can hold my regular
>> business logic and enables an object model in Python? (E.g.: via Data
>> Mapper or Active Record patterns)
> in addition to Teds link, you could also use thrift, if this is enough
> control for you. Depends on your requirement.
> 
> Best wishes,
> 
> Wilm

Re: HDFS-based database for Big and Small data?

Posted by Jay Vyas <ja...@gmail.com>.
1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.

2) SolrCloud also might fit the bill here ? 

Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.

Although the querying is limited...



> On Jan 3, 2015, at 9:39 AM, Wilm Schumacher <wi...@gmail.com> wrote:
> 
>> Am 03.01.2015 um 08:44 schrieb Alec Taylor:
>> Want to replace MongoDB with an HDFS-based database in my architecture.
>> 
>> Note that this is a new system, not a rewrite of an old one.
>> 
>> Are there any open-source "fast" read/write database built on HDFS
> yeah. As Ted wrote: hbase.
> 
>> with a model similar to a document-store,
> well, then PERHAPS hbase isn't the right choice. What exactly do you
> need from the definition of a "doc-store"? If you e.g. rely highly on ad
> hoc queries or secondary indexes then perhaps hbase could lead to some
> additional work for you.
> 
>> that can hold my regular
>> business logic and enables an object model in Python? (E.g.: via Data
>> Mapper or Active Record patterns)
> in addition to Teds link, you could also use thrift, if this is enough
> control for you. Depends on your requirement.
> 
> Best wishes,
> 
> Wilm

Re: HDFS-based database for Big and Small data?

Posted by Jay Vyas <ja...@gmail.com>.
1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.

2) SolrCloud also might fit the bill here ? 

Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.

Although the querying is limited...



> On Jan 3, 2015, at 9:39 AM, Wilm Schumacher <wi...@gmail.com> wrote:
> 
>> Am 03.01.2015 um 08:44 schrieb Alec Taylor:
>> Want to replace MongoDB with an HDFS-based database in my architecture.
>> 
>> Note that this is a new system, not a rewrite of an old one.
>> 
>> Are there any open-source "fast" read/write database built on HDFS
> yeah. As Ted wrote: hbase.
> 
>> with a model similar to a document-store,
> well, then PERHAPS hbase isn't the right choice. What exactly do you
> need from the definition of a "doc-store"? If you e.g. rely highly on ad
> hoc queries or secondary indexes then perhaps hbase could lead to some
> additional work for you.
> 
>> that can hold my regular
>> business logic and enables an object model in Python? (E.g.: via Data
>> Mapper or Active Record patterns)
> in addition to Teds link, you could also use thrift, if this is enough
> control for you. Depends on your requirement.
> 
> Best wishes,
> 
> Wilm

Re: HDFS-based database for Big and Small data?

Posted by Jay Vyas <ja...@gmail.com>.
1)  Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads.

2) SolrCloud also might fit the bill here ? 

Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated indexing and searching options.

Although the querying is limited...



> On Jan 3, 2015, at 9:39 AM, Wilm Schumacher <wi...@gmail.com> wrote:
> 
>> Am 03.01.2015 um 08:44 schrieb Alec Taylor:
>> Want to replace MongoDB with an HDFS-based database in my architecture.
>> 
>> Note that this is a new system, not a rewrite of an old one.
>> 
>> Are there any open-source "fast" read/write database built on HDFS
> yeah. As Ted wrote: hbase.
> 
>> with a model similar to a document-store,
> well, then PERHAPS hbase isn't the right choice. What exactly do you
> need from the definition of a "doc-store"? If you e.g. rely highly on ad
> hoc queries or secondary indexes then perhaps hbase could lead to some
> additional work for you.
> 
>> that can hold my regular
>> business logic and enables an object model in Python? (E.g.: via Data
>> Mapper or Active Record patterns)
> in addition to Teds link, you could also use thrift, if this is enough
> control for you. Depends on your requirement.
> 
> Best wishes,
> 
> Wilm

Re: HDFS-based database for Big and Small data?

Posted by Wilm Schumacher <wi...@gmail.com>.
Am 03.01.2015 um 08:44 schrieb Alec Taylor:
> Want to replace MongoDB with an HDFS-based database in my architecture.
>
> Note that this is a new system, not a rewrite of an old one.
>
> Are there any open-source "fast" read/write database built on HDFS
yeah. As Ted wrote: hbase.

> with a model similar to a document-store,
well, then PERHAPS hbase isn't the right choice. What exactly do you
need from the definition of a "doc-store"? If you e.g. rely highly on ad
hoc queries or secondary indexes then perhaps hbase could lead to some
additional work for you.

> that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
in addition to Teds link, you could also use thrift, if this is enough
control for you. Depends on your requirement.

Best wishes,

Wilm

Re: HDFS-based database for Big and Small data?

Posted by Wilm Schumacher <wi...@gmail.com>.
Am 03.01.2015 um 08:44 schrieb Alec Taylor:
> Want to replace MongoDB with an HDFS-based database in my architecture.
>
> Note that this is a new system, not a rewrite of an old one.
>
> Are there any open-source "fast" read/write database built on HDFS
yeah. As Ted wrote: hbase.

> with a model similar to a document-store,
well, then PERHAPS hbase isn't the right choice. What exactly do you
need from the definition of a "doc-store"? If you e.g. rely highly on ad
hoc queries or secondary indexes then perhaps hbase could lead to some
additional work for you.

> that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
in addition to Teds link, you could also use thrift, if this is enough
control for you. Depends on your requirement.

Best wishes,

Wilm

Re: HDFS-based database for Big and Small data?

Posted by Ted Yu <yu...@gmail.com>.
If you choose to store your data in HBase, see the following for a project
that supports accessing HBase through Python:

http://search-hadoop.com/m/DHED4syFSo1/hbase+HappyBase&subj=+announce+HappyBase+0+8+a+developer+friendly+Python+library+for+HBase

Cheers

On Fri, Jan 2, 2015 at 11:44 PM, Alec Taylor <al...@gmail.com> wrote:

> Want to replace MongoDB with an HDFS-based database in my architecture.
>
> Note that this is a new system, not a rewrite of an old one.
>
> Are there any open-source "fast" read/write database built on HDFS
> with a model similar to a document-store, that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
>
> Thanks for all suggestions
>
> PS: I am using the "Big data" definition from Cloudera, i.e.: any data
> which expands more than one machine
>

Re: HDFS-based database for Big and Small data?

Posted by "bit1129@163.com" <bi...@163.com>.
Hbase, a NoSQL database of the Hadoop family, is what you says   that is built upon HDFS, but is there really strong reason that you would move from MongoDB to Hbase? Is your data really that huge?

As far as I tell, Hbase is extremely hard to use, hard to maintain as well. If I standood in your position, I would choose MongoDB with no hesitate



bit1129@163.com
 
From: Alec Taylor
Date: 2015-01-03 15:44
To: user
Subject: HDFS-based database for Big and Small data?
Want to replace MongoDB with an HDFS-based database in my architecture.
 
Note that this is a new system, not a rewrite of an old one.
 
Are there any open-source "fast" read/write database built on HDFS
with a model similar to a document-store, that can hold my regular
business logic and enables an object model in Python? (E.g.: via Data
Mapper or Active Record patterns)
 
Thanks for all suggestions
 
PS: I am using the "Big data" definition from Cloudera, i.e.: any data
which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by "bit1129@163.com" <bi...@163.com>.
Hbase, a NoSQL database of the Hadoop family, is what you says   that is built upon HDFS, but is there really strong reason that you would move from MongoDB to Hbase? Is your data really that huge?

As far as I tell, Hbase is extremely hard to use, hard to maintain as well. If I standood in your position, I would choose MongoDB with no hesitate



bit1129@163.com
 
From: Alec Taylor
Date: 2015-01-03 15:44
To: user
Subject: HDFS-based database for Big and Small data?
Want to replace MongoDB with an HDFS-based database in my architecture.
 
Note that this is a new system, not a rewrite of an old one.
 
Are there any open-source "fast" read/write database built on HDFS
with a model similar to a document-store, that can hold my regular
business logic and enables an object model in Python? (E.g.: via Data
Mapper or Active Record patterns)
 
Thanks for all suggestions
 
PS: I am using the "Big data" definition from Cloudera, i.e.: any data
which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by Ted Yu <yu...@gmail.com>.
If you choose to store your data in HBase, see the following for a project
that supports accessing HBase through Python:

http://search-hadoop.com/m/DHED4syFSo1/hbase+HappyBase&subj=+announce+HappyBase+0+8+a+developer+friendly+Python+library+for+HBase

Cheers

On Fri, Jan 2, 2015 at 11:44 PM, Alec Taylor <al...@gmail.com> wrote:

> Want to replace MongoDB with an HDFS-based database in my architecture.
>
> Note that this is a new system, not a rewrite of an old one.
>
> Are there any open-source "fast" read/write database built on HDFS
> with a model similar to a document-store, that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
>
> Thanks for all suggestions
>
> PS: I am using the "Big data" definition from Cloudera, i.e.: any data
> which expands more than one machine
>

Re: HDFS-based database for Big and Small data?

Posted by Wilm Schumacher <wi...@gmail.com>.
Am 03.01.2015 um 08:44 schrieb Alec Taylor:
> Want to replace MongoDB with an HDFS-based database in my architecture.
>
> Note that this is a new system, not a rewrite of an old one.
>
> Are there any open-source "fast" read/write database built on HDFS
yeah. As Ted wrote: hbase.

> with a model similar to a document-store,
well, then PERHAPS hbase isn't the right choice. What exactly do you
need from the definition of a "doc-store"? If you e.g. rely highly on ad
hoc queries or secondary indexes then perhaps hbase could lead to some
additional work for you.

> that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
in addition to Teds link, you could also use thrift, if this is enough
control for you. Depends on your requirement.

Best wishes,

Wilm

Re: HDFS-based database for Big and Small data?

Posted by "bit1129@163.com" <bi...@163.com>.
Hbase, a NoSQL database of the Hadoop family, is what you says   that is built upon HDFS, but is there really strong reason that you would move from MongoDB to Hbase? Is your data really that huge?

As far as I tell, Hbase is extremely hard to use, hard to maintain as well. If I standood in your position, I would choose MongoDB with no hesitate



bit1129@163.com
 
From: Alec Taylor
Date: 2015-01-03 15:44
To: user
Subject: HDFS-based database for Big and Small data?
Want to replace MongoDB with an HDFS-based database in my architecture.
 
Note that this is a new system, not a rewrite of an old one.
 
Are there any open-source "fast" read/write database built on HDFS
with a model similar to a document-store, that can hold my regular
business logic and enables an object model in Python? (E.g.: via Data
Mapper or Active Record patterns)
 
Thanks for all suggestions
 
PS: I am using the "Big data" definition from Cloudera, i.e.: any data
which expands more than one machine

Re: HDFS-based database for Big and Small data?

Posted by Ted Yu <yu...@gmail.com>.
If you choose to store your data in HBase, see the following for a project
that supports accessing HBase through Python:

http://search-hadoop.com/m/DHED4syFSo1/hbase+HappyBase&subj=+announce+HappyBase+0+8+a+developer+friendly+Python+library+for+HBase

Cheers

On Fri, Jan 2, 2015 at 11:44 PM, Alec Taylor <al...@gmail.com> wrote:

> Want to replace MongoDB with an HDFS-based database in my architecture.
>
> Note that this is a new system, not a rewrite of an old one.
>
> Are there any open-source "fast" read/write database built on HDFS
> with a model similar to a document-store, that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
>
> Thanks for all suggestions
>
> PS: I am using the "Big data" definition from Cloudera, i.e.: any data
> which expands more than one machine
>

Re: HDFS-based database for Big and Small data?

Posted by Wilm Schumacher <wi...@gmail.com>.
Am 03.01.2015 um 08:44 schrieb Alec Taylor:
> Want to replace MongoDB with an HDFS-based database in my architecture.
>
> Note that this is a new system, not a rewrite of an old one.
>
> Are there any open-source "fast" read/write database built on HDFS
yeah. As Ted wrote: hbase.

> with a model similar to a document-store,
well, then PERHAPS hbase isn't the right choice. What exactly do you
need from the definition of a "doc-store"? If you e.g. rely highly on ad
hoc queries or secondary indexes then perhaps hbase could lead to some
additional work for you.

> that can hold my regular
> business logic and enables an object model in Python? (E.g.: via Data
> Mapper or Active Record patterns)
in addition to Teds link, you could also use thrift, if this is enough
control for you. Depends on your requirement.

Best wishes,

Wilm