You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@livy.apache.org by Suraj Sharma <su...@gmail.com> on 2019/08/20 07:32:18 UTC

Need help in understanding Livy w.r.t my use case

Hi,

I have a use case in which:
- I have a single file having 10M records in it.
- I want to run multiple jobs against this file.

So, right now what is happening with using simple Apache Spark is that I
need to load this 10M file for each job that I submit.

I was hoping Apache Livy could help me here. Help me understand Livy
better. Below is my understanding of Livy.

1) If I create a job (i.e a spark context) called to say "load file in
memory" and use Livy to run it, then Spark will read this file and keep
this RDD cached at the spark context level which at the end will reside
in-memory at Livy's end.

1.1) My Spark jobs then can die and the file is still cached at the spark
context level. This will consume some memory on Livy's end.

1.2) At this point, my task slots are all free to pick up another jobs and
no Spark executors are running.

2) I can then later create more jobs and submit using Livy. This time I
will use the context created by the above job. This way I will be able to
use the file cached by Livy.


Please help me understand this better.

-- 

Warm Regards,
Suraj Sharma
Phone: +91-9741370819
Skype: surajsharma121