You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Alec Taylor <al...@gmail.com> on 2015/01/20 12:40:10 UTC

Low-latency queries, HBase exclusively or should I go, e.g.: MongoDB?

I am architecting a platform incorporating: recommender systems,
information retrieval (ML), sequence mining, and Natural Language
Processing.

Additionally I have the generic CRUD and authentication components,
with everything exposed RESTfully.

For the storage layer(s), there are a few options which immediately
present themselves:

Generic CRUD layer (high speed needed here, though I suppose I could use Redis…)

- Hadoop with HBase, perhaps with Phoenix for an elastic loose-schema
SQL layer atop
- Apache Spark (perhaps piping to HDFS)… ¿maybe?
- MongoDB (or a similar document-store), a graph-database, or even
something like Postgres

Analytics layer (to enable Big Data / Data-intensive computing features)

- Apache Spark
- Hadoop with MapReduce and/or utilising some other Apache /
non-Apache project with integration
- Disco (from Nokia)

________________________________

Should I prefer one layer—e.g.: on HDFS—over multiple disparite
layers? - The advantage here is obvious, but I am certain there are
disadvantages. (and yes, I know there are various ways; automated and
manual; to push data from non HDFS-backed stores to HDFS)

Also, as a bonus answer, which stack would you recommend for this
user-network I'm building?