You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Ajay <aj...@gmail.com> on 2014/12/05 08:25:03 UTC

Clarifications on Spark

Hello,

I work for an eCommerce company. Currently we are looking at building a Data
warehouse platform as described below:

DW as a Service
    |
REST API
    |
SQL On No SQL (Drill/Pig/Hive/Spark SQL)
    |
No SQL databases (One or more. May be RDBMS directly too)
    | (Bulk load)
My SQL Database    

I wish to get a few clarifications on Apache Drill as follows:

1) Can we use Spark for SQL on No SQL or do we need to mix them with
Pig/Hive or any other for any reason?
2) Can Spark SQL be used a query interface for Business Intelligence,
Analytics and Reporting
3) Is Spark supports only Hadoop, HBase?. We may use
Cassandra/MongoDb/CouchBase as well.
4) Is Spark supports RDBMS too?. We can have a single interface to pull out
data from multiple data sources?
5) Any recommendations(not limited to usage of Spark) for our specific
requirement described above.

Thanks
Ajay

Note : I have posted a similar post on the Drill User list as well as I am
not sure which one best fits for our usecase.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Clarifications-on-Spark-tp20440.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

R: Clarifications on Spark

Posted by Paolo Platter <pa...@agilelab.it>.

Hi,

1) yes you can. Spark is supporting a lot of file formats on hdfs/s3 then is supporting cassandra and jdbc in General.

2) yes. Spark has a jdbc thrift server where you can attach BI tools. I suggest to you to pay attention to your Query response time requirements.

3) no you can go with Cassandra. If you are looking at mongodb you should give a try to stratio platform

4) yes. Using JdbcRDD you can leverage rdbms too

5) I suggest to use spark as a computation engine, build your pre-aggregated views and persist them on a data store like Cassandra. Then attach the BI tools to aggregated views directly.

Paolo

Inviata dal mio Windows Phone
________________________________
Da: Ajay<ma...@gmail.com>
Inviato: ‎05/‎12/‎2014 07:25
A: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
Oggetto: Clarifications on Spark

Hello,

I work for an eCommerce company. Currently we are looking at building a Data
warehouse platform as described below:

DW as a Service
|
REST API
|
SQL On No SQL (Drill/Pig/Hive/Spark SQL)
|
No SQL databases (One or more. May be RDBMS directly too)
| (Bulk load)
My SQL Database

I wish to get a few clarifications on Apache Drill as follows:

1) Can we use Spark for SQL on No SQL or do we need to mix them with
Pig/Hive or any other for any reason?
2) Can Spark SQL be used a query interface for Business Intelligence,
Analytics and Reporting
3) Is Spark supports only Hadoop, HBase?. We may use
Cassandra/MongoDb/CouchBase as well.
4) Is Spark supports RDBMS too?. We can have a single interface to pull out
data from multiple data sources?
5) Any recommendations(not limited to usage of Spark) for our specific
requirement described above.

Thanks
Ajay

Note : I have posted a similar post on the Drill User list as well as I am
not sure which one best fits for our usecase.

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Clarifications-on-Spark-tp20440.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org