You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Affan Syed <as...@an10.io> on 2018/10/25 07:28:43 UTC

Having access to spark results

Spark users,
We really would want to get an input here about how the results from a
Spark Query will be accessible to a web-application. Given Spark is a well
used in the industry I would have thought that this part would have lots of
answers/tutorials about it, but I didnt find anything.

Here are a few options that come to mind

1) Spark results are saved in another DB ( perhaps a traditional one) and a
request for query returns the new table name for access through a paginated
query. That seems doable, although a bit convoluted as we need to handle
the completion of the query.

2) Spark results are pumped into a messaging queue from which a socket
server like connection is made.

What confuses me is that other connectors to spark, like those for Tableau,
using something like JDBC should have all the data (not the top 500 that we
typically can get via Livy or other REST interfaces to Spark). How do those
connectors get all the data through a single connection?


Can someone with expertise help in bringing clarity.

Thank you.

Affan
ᐧ
ᐧ

Re: [External Sender] Having access to spark results

Posted by Affan Syed <as...@an10.io>.
Femi,
We have a solution that needs to be both on-prem and also in the cloud.

Not sure how that impacts anything, what we want is to run an analytical
query on a large dataset (ours is over Cassandra) -- so batch in that
sense, but think on-demand --- and then have the result be entirely (not
first x number of rows) available for a web application to access the
results.

Web application work over a REST API, so while the query can be submitted
through something like Livy or the thrift-server, the concern is how do we
get the final result back to be useful.

I could think of two ways of doing that.

A  global temp table would work, but that would be first point --- it seems
a bit involved. My point was that, has someone solved that problem and run
through all the steps?


- Affan

ᐧ

On Thu, Oct 25, 2018 at 12:39 PM Femi Anthony <
olufemi.anthony@capitalone.com> wrote:

> What sort of environment are you running Spark on - in the cloud, on
> premise ? Is its a real-time or batch oriented application?
> Please provide more details.
> Femi
>
> On Thu, Oct 25, 2018 at 3:29 AM Affan Syed <as...@an10.io> wrote:
>
>> Spark users,
>> We really would want to get an input here about how the results from a
>> Spark Query will be accessible to a web-application. Given Spark is a well
>> used in the industry I would have thought that this part would have lots of
>> answers/tutorials about it, but I didnt find anything.
>>
>> Here are a few options that come to mind
>>
>> 1) Spark results are saved in another DB ( perhaps a traditional one) and
>> a request for query returns the new table name for access through a
>> paginated query. That seems doable, although a bit convoluted as we need to
>> handle the completion of the query.
>>
>> 2) Spark results are pumped into a messaging queue from which a socket
>> server like connection is made.
>>
>> What confuses me is that other connectors to spark, like those for
>> Tableau, using something like JDBC should have all the data (not the top
>> 500 that we typically can get via Livy or other REST interfaces to Spark).
>> How do those connectors get all the data through a single connection?
>>
>>
>> Can someone with expertise help in bringing clarity.
>>
>> Thank you.
>>
>> Affan
>> ᐧ
>> ᐧ
>>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: [External Sender] Having access to spark results

Posted by Femi Anthony <ol...@capitalone.com>.
What sort of environment are you running Spark on - in the cloud, on
premise ? Is its a real-time or batch oriented application?
Please provide more details.
Femi

On Thu, Oct 25, 2018 at 3:29 AM Affan Syed <as...@an10.io> wrote:

> Spark users,
> We really would want to get an input here about how the results from a
> Spark Query will be accessible to a web-application. Given Spark is a well
> used in the industry I would have thought that this part would have lots of
> answers/tutorials about it, but I didnt find anything.
>
> Here are a few options that come to mind
>
> 1) Spark results are saved in another DB ( perhaps a traditional one) and
> a request for query returns the new table name for access through a
> paginated query. That seems doable, although a bit convoluted as we need to
> handle the completion of the query.
>
> 2) Spark results are pumped into a messaging queue from which a socket
> server like connection is made.
>
> What confuses me is that other connectors to spark, like those for
> Tableau, using something like JDBC should have all the data (not the top
> 500 that we typically can get via Livy or other REST interfaces to Spark).
> How do those connectors get all the data through a single connection?
>
>
> Can someone with expertise help in bringing clarity.
>
> Thank you.
>
> Affan
> ᐧ
> ᐧ
>
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Fwd: Having access to spark results

Posted by onmstester onmstester <on...@zoho.com.INVALID>.
What about using cache() or save as a global temp table  for subsequent access? Sent using Zoho Mail ============ Forwarded message ============ From : Affan Syed <as...@an10.io> To : "spark users"<us...@spark.apache.org> Date : Thu, 25 Oct 2018 10:58:43 +0330 Subject : Having access to spark results ============ Forwarded message ============ Spark users,  We really would want to get an input here about how the results from a Spark Query will be accessible to a web-application. Given Spark is a well used in the industry I would have thought that this part would have lots of answers/tutorials about it, but I didnt find anything. Here are a few options that come to mind 1) Spark results are saved in another DB ( perhaps a traditional one) and a request for query returns the new table name for access through a paginated query. That seems doable, although a bit convoluted as we need to handle the completion of the query. 2) Spark results are pumped into a messaging queue from which a socket server like connection is made. What confuses me is that other connectors to spark, like those for Tableau, using something like JDBC should have all the data (not the top 500 that we typically can get via Livy or other REST interfaces to Spark). How do those connectors get all the data through a single connection? Can someone with expertise help in bringing clarity.  Thank you.  Affan ᐧ ᐧ