You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/09/14 11:35:13 UTC
Spark Interview questions
Hi,
As a learner I appreciate if you have typical Spark interview questions for Spark/Scala junior roles that you can please forward to me.
I will be very obliged
Re: Spark Interview questions
Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,
Doh, Mich, it's way too much to ask for "typical Spark interview
questions for Spark/Scala junior roles". There are plenty of such
questions and I don't think there's a way to have them all noted down.
Spark supports 5 languages, offers 4 modules + Core, and presents
itself differently to developers, admins and performance g33ks. With 3
supported cluster managers in and you see I'm staying far from such
questions. Too much to handle.
Pass.
p.s. The more I'm with Spark the more I'm overwhelmed how complex it
is. So many sections with FIXMEs/TODOs in my Spark notes...
Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Wed, Sep 14, 2016 at 4:09 PM, Mich Talebzadeh
<mi...@gmail.com> wrote:
> Hi Ashok,
>
> I am sure we all have some war stories some of which I recall:
>
> What is meant by RDD, DataFrame and Dataset
> What is the meant by "All transformations in Spark are lazy"?
> What are the two types of operations supported by RDD?
> What is meant by Spark running under a certain mode?
> Explain the difference between Spark Running in a Standalone mode and Yarn
> cluster mode
> What is the difference between Spark running in Yarn client mode and Yarn
> cluster mode.
> What is the difference between persist and cache
> If you cache a DataFrame what does it do and where is the memory consumed
> come from. Can you give a place where you can see its measurements
> What is meant by DAG? A broad outline
> What is shuffling in Spark. How can you minimise its impact
> How would you specify your spark hardware in a medium size set-up say 8 node
> cluster.
> How could one minimise the network latency within Spark and the underlying
> storage (assuming HDFS here)
> How can you parallelize your JDBC connection to a database say any RDBMS?
> How does it work
> What is the use case for Spark Thrift Server.
> How would you typically read and process a tab separated file into Spark
> If you have an OOM message in Spark how would you go about diagnosing the
> problem
> What is meant by spark-submit. How would you use it
> What is a Spark driver? If you run Spark in Local mode how many executors
> can you start
> What is meant by Spark Streaming. What is a use case example
> In Spark Streaming what parameters are important
> What are the typical analytic functions in Spark SQL
> What is the difference between RANK and DENSE_RANK
>
>
> I am sure there are many other questions that one think of. For example,
> someone like Jacek Laskowski can provide more programming questions as he is
> a professional Spark trainer :)
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 14 September 2016 at 12:35, Ashok Kumar <as...@yahoo.com.invalid>
> wrote:
>>
>> Hi,
>>
>> As a learner I appreciate if you have typical Spark interview questions
>> for Spark/Scala junior roles that you can please forward to me.
>>
>> I will be very obliged
>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Spark Interview questions
Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Ashok,
I am sure we all have some war stories some of which I recall:
1. What is meant by RDD, DataFrame and Dataset
2. What is the meant by "All transformations in Spark are lazy"?
3. What are the two types of operations supported by RDD?
4. What is meant by Spark running under a certain mode?
5. Explain the difference between Spark Running in a Standalone mode and
Yarn cluster mode
6. What is the difference between Spark running in Yarn client mode and
Yarn cluster mode.
7. What is the difference between persist and cache
8. If you cache a DataFrame what does it do and where is the memory
consumed come from. Can you give a place where you can see its measurements
9. What is meant by DAG? A broad outline
10. What is shuffling in Spark. How can you minimise its impact
11. How would you specify your spark hardware in a medium size set-up
say 8 node cluster.
12. How could one minimise the network latency within Spark and the
underlying storage (assuming HDFS here)
13. How can you parallelize your JDBC connection to a database say any
RDBMS? How does it work
14. What is the use case for Spark Thrift Server.
15. How would you typically read and process a tab separated file into
Spark
16. If you have an OOM message in Spark how would you go about
diagnosing the problem
17. What is meant by spark-submit. How would you use it
18. What is a Spark driver? If you run Spark in Local mode how many
executors can you start
19. What is meant by Spark Streaming. What is a use case example
20. In Spark Streaming what parameters are important
21. What are the typical analytic functions in Spark SQL
22. What is the difference between RANK and DENSE_RANK
- I am sure there are many other questions that one think of. For example,
someone like Jacek Laskowski can provide more programming questions as he
is a professional Spark trainer :)
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On 14 September 2016 at 12:35, Ashok Kumar <as...@yahoo.com.invalid>
wrote:
> Hi,
>
> As a learner I appreciate if you have typical Spark interview questions
> for Spark/Scala junior roles that you can please forward to me.
>
> I will be very obliged
>