You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Alec Taylor <al...@gmail.com> on 2015/01/20 12:40:10 UTC
Low-latency queries, HBase exclusively or should I go, e.g.: MongoDB?
I am architecting a platform incorporating: recommender systems,
information retrieval (ML), sequence mining, and Natural Language
Processing.
Additionally I have the generic CRUD and authentication components,
with everything exposed RESTfully.
For the storage layer(s), there are a few options which immediately
present themselves:
Generic CRUD layer (high speed needed here, though I suppose I could use Redis…)
- Hadoop with HBase, perhaps with Phoenix for an elastic loose-schema
SQL layer atop
- Apache Spark (perhaps piping to HDFS)… ¿maybe?
- MongoDB (or a similar document-store), a graph-database, or even
something like Postgres
Analytics layer (to enable Big Data / Data-intensive computing features)
- Apache Spark
- Hadoop with MapReduce and/or utilising some other Apache /
non-Apache project with integration
- Disco (from Nokia)
________________________________
Should I prefer one layer—e.g.: on HDFS—over multiple disparite
layers? - The advantage here is obvious, but I am certain there are
disadvantages. (and yes, I know there are various ways; automated and
manual; to push data from non HDFS-backed stores to HDFS)
Also, as a bonus answer, which stack would you recommend for this
user-network I'm building?