You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Alaa Ali <co...@gmail.com> on 2014/11/23 21:37:06 UTC

Creating a front-end for output from Spark/PySpark

Hello. Okay, so I'm working on a project to run analytic processing using
Spark or PySpark. Right now, I connect to the shell and execute my
commands. The very first part of my commands is: create an SQL JDBC
connection and cursor to pull from Apache Phoenix, do some processing on
the returned data, and spit out some output. I want to create a web "gui"
tool kind of a thing where I play around with what SQL query is executed
for my analysis.

I know that I can write my whole Spark program and use spark-submit and
have it accept and argument to be the SQL query I want to execute, but this
means that every time I submit: an SQL connection will be created, query
ran, processing done, output printed, program closes and SQL connection
closes, and then the whole thing repeats if I want to do another query
right away. That will probably cause it to be very slow. Is there a way
where I can somehow have the SQL connection "working" in the backend for
example, and then all I have to do is supply a query from my GUI tool where
it then takes it, runs it, displays the output? I just want to know the big
picture and a broad overview of how would I go about doing this and what
additional technology to use and I'll dig up the rest.

Regards,
Alaa Ali

Re: Creating a front-end for output from Spark/PySpark

Posted by Alex Kamil <al...@gmail.com>.
Alaa,

one  option is to use Spark as a cache, importing subset of data from
hbase/phoenix that fits in memory, and using jdbcrdd to get more data on
cache miss. The front end can be created with pyspark and flusk, either as
rest api translating json requests to sparkSQL dialect, or simply allowing
the user to post sparkSql queries directly

On Sun, Nov 23, 2014 at 3:37 PM, Alaa Ali <co...@gmail.com> wrote:

> Hello. Okay, so I'm working on a project to run analytic processing using
> Spark or PySpark. Right now, I connect to the shell and execute my
> commands. The very first part of my commands is: create an SQL JDBC
> connection and cursor to pull from Apache Phoenix, do some processing on
> the returned data, and spit out some output. I want to create a web "gui"
> tool kind of a thing where I play around with what SQL query is executed
> for my analysis.
>
> I know that I can write my whole Spark program and use spark-submit and
> have it accept and argument to be the SQL query I want to execute, but this
> means that every time I submit: an SQL connection will be created, query
> ran, processing done, output printed, program closes and SQL connection
> closes, and then the whole thing repeats if I want to do another query
> right away. That will probably cause it to be very slow. Is there a way
> where I can somehow have the SQL connection "working" in the backend for
> example, and then all I have to do is supply a query from my GUI tool where
> it then takes it, runs it, displays the output? I just want to know the big
> picture and a broad overview of how would I go about doing this and what
> additional technology to use and I'll dig up the rest.
>
> Regards,
> Alaa Ali
>

RE: Creating a front-end for output from Spark/PySpark

Posted by Mohammed Guller <mo...@glassbeam.com>.
Two options that I can think of:

1)      Use the Spark SQL Thrift/JDBC server.

2)      Develop a web app using some framework such as Play and expose a set of REST APIs for sending queries. Inside your web app backend, you initialize the Spark SQL context only once when your app initializes. Then you use that context for executing queries sent using your REST API.

Mohammed

From: Alaa Ali [mailto:contact.alaa@gmail.com]
Sent: Sunday, November 23, 2014 12:37 PM
To: user@spark.apache.org
Subject: Creating a front-end for output from Spark/PySpark

Hello. Okay, so I'm working on a project to run analytic processing using Spark or PySpark. Right now, I connect to the shell and execute my commands. The very first part of my commands is: create an SQL JDBC connection and cursor to pull from Apache Phoenix, do some processing on the returned data, and spit out some output. I want to create a web "gui" tool kind of a thing where I play around with what SQL query is executed for my analysis.

I know that I can write my whole Spark program and use spark-submit and have it accept and argument to be the SQL query I want to execute, but this means that every time I submit: an SQL connection will be created, query ran, processing done, output printed, program closes and SQL connection closes, and then the whole thing repeats if I want to do another query right away. That will probably cause it to be very slow. Is there a way where I can somehow have the SQL connection "working" in the backend for example, and then all I have to do is supply a query from my GUI tool where it then takes it, runs it, displays the output? I just want to know the big picture and a broad overview of how would I go about doing this and what additional technology to use and I'll dig up the rest.

Regards,
Alaa Ali