You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Austin Heyne <ah...@ccri.com> on 2018/01/12 18:35:10 UTC

Issue with multiple users running Spark

Hi everyone,

I'm currently running Zeppelin on a spark master node using the AWS 
provided Zeppelin install. I'm trying to get the notebook setup so 
multiple devs can use it (and the spark cluster) concurrently. I have 
the spark interpreter set to instantiate 'Per Note' in 'isolated' 
processes. I also have 'spark.dynamicAllocation.enabled' set to 'true' 
so the multiple spark contexts can share the cluster.

The problem I'm seeing is when the second spark context tries to 
instantiate hive starts throwing errors because the Derby database has 
already been booted (by the other context). Full stack trace is 
available here [1]. How do I go about working around this? Is there a 
way to have it use another database or is this a limitation?

Thanks for any help!

[1] https://gist.github.com/aheyne/8d84eaedefb997f248b6e88c1b9e1e34

-- 
Austin L. Heyne

Re: Issue with multiple users running Spark

Posted by Michael Segel <ms...@hotmail.com>.

Do you even need a hive context if you’re never going to connect to Hive on Hadoop?

If you are running Hadoop..
If you’re not running Tez, you might as well as skip hive and just run SparkSQL
Of course if your users are hooked on Hue… you may have an issue.

Sorry, this may be slightly off topic…

-Mike

On Jan 17, 2018, at 11:38 AM, Felix Cheung <fe...@hotmail.com>> wrote:

Should we have some doc on this? I think this could be a common problem

________________________________
From: Austin Heyne <ah...@ccri.com>>
Sent: Monday, January 15, 2018 6:59:55 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Issue with multiple users running Spark

Thanks Jeff and Michael for the help. We're seeing good success just disabling 'zeppelin.spark.useHiveContext'.

-Austin

On 01/12/2018 07:56 PM, Jeff Zhang wrote:

There're 2 options for you:

1. Disable hiveContext in spark via setting zeppelin.spark.useHiveContext to false in spark's interpreter setting
2. Connect to hive metastore service instead of single derby instance. You can configure that in your hive-site.xml

Michael Segel <ms...@hotmail.com>>于2018年1月13日周六 上午2:40写道：
Hi,

Quick response… unless you tell Derby to set up as a networked service (this is going back to SilverCloud days), its a single user instance. So it won’t work.
Were you using MySQL or something… you would have better luck…

I think if you go back in to Derby’s docs and see how to start this as a networked server (multi-user) , you could try it.
Most people don’t do this because not many people know Derby and I don’t know how well that portion of the code has been maintained over the years.

HTH

-Mike

> On Jan 12, 2018, at 12:35 PM, Austin Heyne <ah...@ccri.com>> wrote:
>
> Hi everyone,
>
> I'm currently running Zeppelin on a spark master node using the AWS provided Zeppelin install. I'm trying to get the notebook setup so multiple devs can use it (and the spark cluster) concurrently. I have the spark interpreter set to instantiate 'Per Note' in 'isolated' processes. I also have 'spark.dynamicAllocation.enabled' set to 'true' so the multiple spark contexts can share the cluster.
>
> The problem I'm seeing is when the second spark context tries to instantiate hive starts throwing errors because the Derby database has already been booted (by the other context). Full stack trace is available here [1]. How do I go about working around this? Is there a way to have it use another database or is this a limitation?
>
> Thanks for any help!
>
> [1] https://gist.github.com/aheyne/8d84eaedefb997f248b6e88c1b9e1e34
>
> --
> Austin L. Heyne
>

--
Austin L. Heyne

Re: Issue with multiple users running Spark

Posted by Felix Cheung <fe...@hotmail.com>.

Should we have some doc on this? I think this could be a common problem

________________________________
From: Austin Heyne <ah...@ccri.com>
Sent: Monday, January 15, 2018 6:59:55 AM
To: users@zeppelin.apache.org
Subject: Re: Issue with multiple users running Spark

Thanks Jeff and Michael for the help. We're seeing good success just disabling 'zeppelin.spark.useHiveContext'.

-Austin

On 01/12/2018 07:56 PM, Jeff Zhang wrote:

There're 2 options for you:

1. Disable hiveContext in spark via setting zeppelin.spark.useHiveContext to false in spark's interpreter setting
2. Connect to hive metastore service instead of single derby instance. You can configure that in your hive-site.xml

Michael Segel <ms...@hotmail.com>>于2018年1月13日周六 上午2:40写道：
Hi,

Quick response… unless you tell Derby to set up as a networked service (this is going back to SilverCloud days), its a single user instance. So it won’t work.
Were you using MySQL or something… you would have better luck…

I think if you go back in to Derby’s docs and see how to start this as a networked server (multi-user) , you could try it.
Most people don’t do this because not many people know Derby and I don’t know how well that portion of the code has been maintained over the years.

HTH

-Mike

> On Jan 12, 2018, at 12:35 PM, Austin Heyne <ah...@ccri.com>> wrote:
>
> Hi everyone,
>
> I'm currently running Zeppelin on a spark master node using the AWS provided Zeppelin install. I'm trying to get the notebook setup so multiple devs can use it (and the spark cluster) concurrently. I have the spark interpreter set to instantiate 'Per Note' in 'isolated' processes. I also have 'spark.dynamicAllocation.enabled' set to 'true' so the multiple spark contexts can share the cluster.
>
> The problem I'm seeing is when the second spark context tries to instantiate hive starts throwing errors because the Derby database has already been booted (by the other context). Full stack trace is available here [1]. How do I go about working around this? Is there a way to have it use another database or is this a limitation?
>
> Thanks for any help!
>
> [1] https://gist.github.com/aheyne/8d84eaedefb997f248b6e88c1b9e1e34
>
> --
> Austin L. Heyne
>

--
Austin L. Heyne

Re: Issue with multiple users running Spark

Posted by Austin Heyne <ah...@ccri.com>.

Thanks Jeff and Michael for the help. We're seeing good success just 
disabling 'zeppelin.spark.useHiveContext'.

-Austin


On 01/12/2018 07:56 PM, Jeff Zhang wrote:
>
> There're 2 options for you:
>
> 1. Disable hiveContext in spark via setting 
> zeppelin.spark.useHiveContext to false in spark's interpreter setting
> 2. Connect to hive metastore service instead of single derby instance. 
> You can configure that in your hive-site.xml
>
>
>
> Michael Segel <msegel_hadoop@hotmail.com 
> <ma...@hotmail.com>>于2018年1月13日周六 上午2:40写道：
>
>     Hi,
>
>     Quick response… unless you tell Derby to set up as a networked
>     service (this is going back to SilverCloud days), its a single
>     user instance. So it won’t work.
>     Were you using MySQL or something… you would have better luck…
>
>
>     I think if you go back in to Derby’s docs and see how to start
>     this as a networked server (multi-user) , you could try it.
>     Most people don’t do this because not many people know Derby and I
>     don’t know how well that portion of the code has been maintained
>     over the years.
>
>
>     HTH
>
>     -Mike
>
>     > On Jan 12, 2018, at 12:35 PM, Austin Heyne <aheyne@ccri.com
>     <ma...@ccri.com>> wrote:
>     >
>     > Hi everyone,
>     >
>     > I'm currently running Zeppelin on a spark master node using the
>     AWS provided Zeppelin install. I'm trying to get the notebook
>     setup so multiple devs can use it (and the spark cluster)
>     concurrently. I have the spark interpreter set to instantiate 'Per
>     Note' in 'isolated' processes. I also have
>     'spark.dynamicAllocation.enabled' set to 'true' so the multiple
>     spark contexts can share the cluster.
>     >
>     > The problem I'm seeing is when the second spark context tries to
>     instantiate hive starts throwing errors because the Derby database
>     has already been booted (by the other context). Full stack trace
>     is available here [1]. How do I go about working around this? Is
>     there a way to have it use another database or is this a limitation?
>     >
>     > Thanks for any help!
>     >
>     > [1] https://gist.github.com/aheyne/8d84eaedefb997f248b6e88c1b9e1e34
>     >
>     > --
>     > Austin L. Heyne
>     >
>

-- 
Austin L. Heyne

Re: Issue with multiple users running Spark

Posted by Jeff Zhang <zj...@gmail.com>.

There're 2 options for you:

1. Disable hiveContext in spark via setting zeppelin.spark.useHiveContext
to false in spark's interpreter setting
2. Connect to hive metastore service instead of single derby instance. You
can configure that in your hive-site.xml



Michael Segel <ms...@hotmail.com>于2018年1月13日周六 上午2:40写道：

> Hi,
>
> Quick response… unless you tell Derby to set up as a networked service
> (this is going back to SilverCloud days), its a single user instance. So it
> won’t work.
> Were you using MySQL or something… you would have better luck…
>
>
> I think if you go back in to Derby’s docs and see how to start this as a
> networked server (multi-user) , you could try it.
> Most people don’t do this because not many people know Derby and I don’t
> know how well that portion of the code has been maintained over the years.
>
>
> HTH
>
> -Mike
>
> > On Jan 12, 2018, at 12:35 PM, Austin Heyne <ah...@ccri.com> wrote:
> >
> > Hi everyone,
> >
> > I'm currently running Zeppelin on a spark master node using the AWS
> provided Zeppelin install. I'm trying to get the notebook setup so multiple
> devs can use it (and the spark cluster) concurrently. I have the spark
> interpreter set to instantiate 'Per Note' in 'isolated' processes. I also
> have 'spark.dynamicAllocation.enabled' set to 'true' so the multiple spark
> contexts can share the cluster.
> >
> > The problem I'm seeing is when the second spark context tries to
> instantiate hive starts throwing errors because the Derby database has
> already been booted (by the other context). Full stack trace is available
> here [1]. How do I go about working around this? Is there a way to have it
> use another database or is this a limitation?
> >
> > Thanks for any help!
> >
> > [1] https://gist.github.com/aheyne/8d84eaedefb997f248b6e88c1b9e1e34
> >
> > --
> > Austin L. Heyne
> >
>
>

Re: Issue with multiple users running Spark

Posted by Michael Segel <ms...@hotmail.com>.

Hi, 

Quick response… unless you tell Derby to set up as a networked service (this is going back to SilverCloud days), its a single user instance. So it won’t work. 
Were you using MySQL or something… you would have better luck… 

I think if you go back in to Derby’s docs and see how to start this as a networked server (multi-user) , you could try it. 
Most people don’t do this because not many people know Derby and I don’t know how well that portion of the code has been maintained over the years. 

HTH

-Mike

> On Jan 12, 2018, at 12:35 PM, Austin Heyne <ah...@ccri.com> wrote:
> 
> Hi everyone,
> 
> I'm currently running Zeppelin on a spark master node using the AWS provided Zeppelin install. I'm trying to get the notebook setup so multiple devs can use it (and the spark cluster) concurrently. I have the spark interpreter set to instantiate 'Per Note' in 'isolated' processes. I also have 'spark.dynamicAllocation.enabled' set to 'true' so the multiple spark contexts can share the cluster.
> 
> The problem I'm seeing is when the second spark context tries to instantiate hive starts throwing errors because the Derby database has already been booted (by the other context). Full stack trace is available here [1]. How do I go about working around this? Is there a way to have it use another database or is this a limitation?
> 
> Thanks for any help!
> 
> [1] https://gist.github.com/aheyne/8d84eaedefb997f248b6e88c1b9e1e34
> 
> -- 
> Austin L. Heyne
>