You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@livy.apache.org by "Decker, Seth Andrew" <sa...@sandia.gov> on 2018/05/16 14:16:56 UTC

new user to livy/spark, basic use case questions

Hello,

I'm new to Livy and Spark and have a question about how to properly use it.

I'm wanting to use spark both for the interactive scripting side as well as for passing it data/parameters to run x defined algorithms/applications. I'm looking at using the livy to interface with spark restfully, but am not sure if I can handle things how I want(or if that's the intended way to use them). So I can pass python script into spark through livy, which is great.

Is the intended way to get results to save to an hdfs/database/data store and then read those results in? I noticed in the java/python clients that you can create your job through livy and get the results back in the livy http message, which seems much simpler but I'm having trouble/concerns over using that path.

My first issue is I don't necessarily need the client to know the job. I'd rather that be saved to hdfs in apache/Hadoop and then livy just tells spark to run it with x parameters/input. Is this doable with the client or do I just stick with the http api?

If the previous is possible is it also possible to run a python script in spark, called via the java client? From my sleuthing in the github page it looks like you upload/run/submit jars in java and .py in python, and I would probably have use cases of wanting to run both(such as having tensorflow .py scripts, or custom java code). Is there a way to run both from the same client?

Thanks,
Seth Decker


Re: new user to livy/spark, basic use case questions

Posted by "Harsch, Tim" <Ti...@Teradata.com>.
Results for any given script, or script segments, sent to Livy are retrievable in a subsequent call to the statements endpoint.  The results are available as long as the session and server are still alive.  If you want a script to write results to a datasource, so they are more permanent, you could then access them at any time in the future.  It's just a design consideration.


The Livy Programmatic API is useful for batch jobs.  Examples are available on the web site.  The REST API has more features, from what I've gathered.  With it you can run batch or scripts.


The REST API also supports a 'shared' session type.  In a 'shared' session (not yet documented) you could run a scala script, followed by python or R from within the same session.





________________________________
From: Decker, Seth Andrew <sa...@sandia.gov>
Sent: Wednesday, May 16, 2018 7:16:56 AM
To: user@livy.incubator.apache.org
Subject: new user to livy/spark, basic use case questions


Hello,



I’m new to Livy and Spark and have a question about how to properly use it.



I’m wanting to use spark both for the interactive scripting side as well as for passing it data/parameters to run x defined algorithms/applications. I’m looking at using the livy to interface with spark restfully, but am not sure if I can handle things how I want(or if that’s the intended way to use them). So I can pass python script into spark through livy, which is great.



Is the intended way to get results to save to an hdfs/database/data store and then read those results in? I noticed in the java/python clients that you can create your job through livy and get the results back in the livy http message, which seems much simpler but I’m having trouble/concerns over using that path.



My first issue is I don’t necessarily need the client to know the job. I’d rather that be saved to hdfs in apache/Hadoop and then livy just tells spark to run it with x parameters/input. Is this doable with the client or do I just stick with the http api?



If the previous is possible is it also possible to run a python script in spark, called via the java client? From my sleuthing in the github page it looks like you upload/run/submit jars in java and .py in python, and I would probably have use cases of wanting to run both(such as having tensorflow .py scripts, or custom java code). Is there a way to run both from the same client?



Thanks,

Seth Decker