You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by "Kevin (Sangwoo) Kim" <ke...@apache.org> on 2016/05/18 06:18:49 UTC

Small tips when running Zeppelin on EMR

Hi Zeppelin users,

I'v been presenting some demo on "Spark+Zeppelin on AWS EMR" at AWS Summit
Seoul yesterday. I'm so sad that the slides are written in Korean so it's
hard to share, but I'd like to share some essentials.

1. Running Z on EMR is super easy. (EMR team did really good job. You can
do that with only few clicks, took 8min to launch)

2. You can launch EMR with spot instances, it will save your money.

3. You can provide some configs when you launch EMR cluster, so you may
want to save your notebook on S3, proper config is as follow.

[
  {
    "Classification": "zeppelin-env",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          “ZEPPELIN_NOTEBOOK_STORAGE"
             :"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
          "ZEPPELIN_NOTEBOOK_S3_BUCKET": "BUCKET_NAME",
          "ZEPPELIN_NOTEBOOK_S3_USER": "SOME_USER_NAME"
        },
        "Configurations": []
      }
    ]
  }
]

4. You need to set proper spark.executor.memory in Zeppelin interpreter
setting.

5. You can increase or decrease cluster size in cluster detail page.

6. Don't forget to terminate cluster when you're done your job :)

That's all!


If you have more tips, plz add it on this mail thread. Thanks!

- Kevin

Re: Small tips when running Zeppelin on EMR

Posted by Alexander Bezzubov <bz...@apache.org>.

Thank you for sharing, Kevin!

Great tips, especially how to setup S3 storage on EMR.

--
Alex

On Wed, May 18, 2016 at 6:04 PM, Kevin (Sangwoo) Kim <ke...@apache.org>
wrote:

> Hi Ahyoung,
>
> I just added #6 while writing this mail, after realized I kept the cluster
> turn on after the presentation.. (Haha)
>
> I'm attaching the slide.
> (Sorry for non-Korean readers, but most of the slide is screen-shots, I
> hope it helps!)
>
> - Kevin
>
>
>
> 2016년 5월 18일 (수) 오후 4:55, Hyung Sung Shim <hs...@nflabs.com>님이 작성:
>
>> Thank you for sharing great information!
>>
>>
>> 2016-05-18 16:49 GMT+09:00 Ahyoung Ryu <ah...@gmail.com>:
>>
>>> Hi Kevin,
>>>
>>> Thanks for the sharing. It's really helpful indeed not only me but also
>>> to many others.
>>> I think *6.**Don't forget to terminate cluster when you're done your
>>> job* is the most important thing :)
>>> Is there any way I can see your slide? If so, it will be really
>>> appreciate.
>>>
>>> Best regards,
>>> Ahyoung
>>>
>>> 2016년 5월 18일 (수) 오후 3:19, Kevin (Sangwoo) Kim <ke...@apache.org>님이
>>> 작성:
>>>
>>>> Hi Zeppelin users,
>>>>
>>>> I'v been presenting some demo on "Spark+Zeppelin on AWS EMR" at AWS
>>>> Summit Seoul yesterday. I'm so sad that the slides are written in Korean so
>>>> it's hard to share, but I'd like to share some essentials.
>>>>
>>>> 1. Running Z on EMR is super easy. (EMR team did really good job. You
>>>> can do that with only few clicks, took 8min to launch)
>>>>
>>>> 2. You can launch EMR with spot instances, it will save your money.
>>>>
>>>> 3. You can provide some configs when you launch EMR cluster, so you may
>>>> want to save your notebook on S3, proper config is as follow.
>>>>
>>>> [
>>>>   {
>>>>     "Classification": "zeppelin-env",
>>>>     "Properties": {},
>>>>     "Configurations": [
>>>>       {
>>>>         "Classification": "export",
>>>>         "Properties": {
>>>>           “ZEPPELIN_NOTEBOOK_STORAGE"
>>>>              :"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
>>>>           "ZEPPELIN_NOTEBOOK_S3_BUCKET": "BUCKET_NAME",
>>>>           "ZEPPELIN_NOTEBOOK_S3_USER": "SOME_USER_NAME"
>>>>         },
>>>>         "Configurations": []
>>>>       }
>>>>     ]
>>>>   }
>>>> ]
>>>>
>>>> 4. You need to set proper spark.executor.memory in Zeppelin interpreter
>>>> setting.
>>>>
>>>> 5. You can increase or decrease cluster size in cluster detail page.
>>>>
>>>> 6. Don't forget to terminate cluster when you're done your job :)
>>>>
>>>> That's all!
>>>>
>>>>
>>>> If you have more tips, plz add it on this mail thread. Thanks!
>>>>
>>>> - Kevin
>>>>
>>>>
>>>>
>>>>
>>

Re: Small tips when running Zeppelin on EMR

Posted by "Kevin (Sangwoo) Kim" <ke...@apache.org>.

Hi Ahyoung,

I just added #6 while writing this mail, after realized I kept the cluster
turn on after the presentation.. (Haha)

I'm attaching the slide.
(Sorry for non-Korean readers, but most of the slide is screen-shots, I
hope it helps!)

- Kevin



2016년 5월 18일 (수) 오후 4:55, Hyung Sung Shim <hs...@nflabs.com>님이 작성:

> Thank you for sharing great information!
>
>
> 2016-05-18 16:49 GMT+09:00 Ahyoung Ryu <ah...@gmail.com>:
>
>> Hi Kevin,
>>
>> Thanks for the sharing. It's really helpful indeed not only me but also
>> to many others.
>> I think *6.**Don't forget to terminate cluster when you're done your job*
>>  is the most important thing :)
>> Is there any way I can see your slide? If so, it will be really
>> appreciate.
>>
>> Best regards,
>> Ahyoung
>>
>> 2016년 5월 18일 (수) 오후 3:19, Kevin (Sangwoo) Kim <ke...@apache.org>님이 작성:
>>
>>> Hi Zeppelin users,
>>>
>>> I'v been presenting some demo on "Spark+Zeppelin on AWS EMR" at AWS
>>> Summit Seoul yesterday. I'm so sad that the slides are written in Korean so
>>> it's hard to share, but I'd like to share some essentials.
>>>
>>> 1. Running Z on EMR is super easy. (EMR team did really good job. You
>>> can do that with only few clicks, took 8min to launch)
>>>
>>> 2. You can launch EMR with spot instances, it will save your money.
>>>
>>> 3. You can provide some configs when you launch EMR cluster, so you may
>>> want to save your notebook on S3, proper config is as follow.
>>>
>>> [
>>>   {
>>>     "Classification": "zeppelin-env",
>>>     "Properties": {},
>>>     "Configurations": [
>>>       {
>>>         "Classification": "export",
>>>         "Properties": {
>>>           “ZEPPELIN_NOTEBOOK_STORAGE"
>>>              :"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
>>>           "ZEPPELIN_NOTEBOOK_S3_BUCKET": "BUCKET_NAME",
>>>           "ZEPPELIN_NOTEBOOK_S3_USER": "SOME_USER_NAME"
>>>         },
>>>         "Configurations": []
>>>       }
>>>     ]
>>>   }
>>> ]
>>>
>>> 4. You need to set proper spark.executor.memory in Zeppelin interpreter
>>> setting.
>>>
>>> 5. You can increase or decrease cluster size in cluster detail page.
>>>
>>> 6. Don't forget to terminate cluster when you're done your job :)
>>>
>>> That's all!
>>>
>>>
>>> If you have more tips, plz add it on this mail thread. Thanks!
>>>
>>> - Kevin
>>>
>>>
>>>
>>>
>

Re: Small tips when running Zeppelin on EMR

Posted by Hyung Sung Shim <hs...@nflabs.com>.

Thank you for sharing great information!


2016-05-18 16:49 GMT+09:00 Ahyoung Ryu <ah...@gmail.com>:

> Hi Kevin,
>
> Thanks for the sharing. It's really helpful indeed not only me but also to
> many others.
> I think *6.**Don't forget to terminate cluster when you're done your job* is
> the most important thing :)
> Is there any way I can see your slide? If so, it will be really
> appreciate.
>
> Best regards,
> Ahyoung
>
> 2016년 5월 18일 (수) 오후 3:19, Kevin (Sangwoo) Kim <ke...@apache.org>님이 작성:
>
>> Hi Zeppelin users,
>>
>> I'v been presenting some demo on "Spark+Zeppelin on AWS EMR" at AWS
>> Summit Seoul yesterday. I'm so sad that the slides are written in Korean so
>> it's hard to share, but I'd like to share some essentials.
>>
>> 1. Running Z on EMR is super easy. (EMR team did really good job. You can
>> do that with only few clicks, took 8min to launch)
>>
>> 2. You can launch EMR with spot instances, it will save your money.
>>
>> 3. You can provide some configs when you launch EMR cluster, so you may
>> want to save your notebook on S3, proper config is as follow.
>>
>> [
>>   {
>>     "Classification": "zeppelin-env",
>>     "Properties": {},
>>     "Configurations": [
>>       {
>>         "Classification": "export",
>>         "Properties": {
>>           “ZEPPELIN_NOTEBOOK_STORAGE"
>>              :"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
>>           "ZEPPELIN_NOTEBOOK_S3_BUCKET": "BUCKET_NAME",
>>           "ZEPPELIN_NOTEBOOK_S3_USER": "SOME_USER_NAME"
>>         },
>>         "Configurations": []
>>       }
>>     ]
>>   }
>> ]
>>
>> 4. You need to set proper spark.executor.memory in Zeppelin interpreter
>> setting.
>>
>> 5. You can increase or decrease cluster size in cluster detail page.
>>
>> 6. Don't forget to terminate cluster when you're done your job :)
>>
>> That's all!
>>
>>
>> If you have more tips, plz add it on this mail thread. Thanks!
>>
>> - Kevin
>>
>>
>>
>>

Re: Small tips when running Zeppelin on EMR

Posted by Ahyoung Ryu <ah...@gmail.com>.

Hi Kevin,

Thanks for the sharing. It's really helpful indeed not only me but also to
many others.
I think *6.**Don't forget to terminate cluster when you're done your job* is
the most important thing :)
Is there any way I can see your slide? If so, it will be really appreciate.

Best regards,
Ahyoung

2016년 5월 18일 (수) 오후 3:19, Kevin (Sangwoo) Kim <ke...@apache.org>님이 작성:

> Hi Zeppelin users,
>
> I'v been presenting some demo on "Spark+Zeppelin on AWS EMR" at AWS Summit
> Seoul yesterday. I'm so sad that the slides are written in Korean so it's
> hard to share, but I'd like to share some essentials.
>
> 1. Running Z on EMR is super easy. (EMR team did really good job. You can
> do that with only few clicks, took 8min to launch)
>
> 2. You can launch EMR with spot instances, it will save your money.
>
> 3. You can provide some configs when you launch EMR cluster, so you may
> want to save your notebook on S3, proper config is as follow.
>
> [
>   {
>     "Classification": "zeppelin-env",
>     "Properties": {},
>     "Configurations": [
>       {
>         "Classification": "export",
>         "Properties": {
>           “ZEPPELIN_NOTEBOOK_STORAGE"
>              :"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
>           "ZEPPELIN_NOTEBOOK_S3_BUCKET": "BUCKET_NAME",
>           "ZEPPELIN_NOTEBOOK_S3_USER": "SOME_USER_NAME"
>         },
>         "Configurations": []
>       }
>     ]
>   }
> ]
>
> 4. You need to set proper spark.executor.memory in Zeppelin interpreter
> setting.
>
> 5. You can increase or decrease cluster size in cluster detail page.
>
> 6. Don't forget to terminate cluster when you're done your job :)
>
> That's all!
>
>
> If you have more tips, plz add it on this mail thread. Thanks!
>
> - Kevin
>
>
>
>