You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Lior Chaga <li...@taboola.com> on 2021/10/17 09:04:54 UTC
Error in pyspark
Hi,
I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
The driver has Anaconda with python 3.6.5 installed.
Running pyspark paragraph, I'm getting some weird behavior. Paragraph runs
successfully on first attempt, but then failing on successive attempts
(until interpreter restarted).
Error is:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started by:
admin') AttributeError: 'SparkSession' object has no attribute 'setJobGroup'
Fail to setJobGroup
For instance, this paragraph:
%spark.pyspark
import pandas as pd
from pyspark.sql.types import StringType
import numpy as np
def np_sqrt(v):
return np.__path__
spark.udf.register("np_sqrt", np_sqrt, StringType())
df = spark.range(10).createOrReplaceTempView("d")
spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
BTW, I can run different pyspark paragraphs, each will succeed at the first
attempt. Once re-running a paragraph, every pyspark paragraph will fail.
Any idea what may cause it?
Thanks,
Lior
Re: Error in pyspark
Posted by Lior Chaga <li...@taboola.com>.
I have a slightly patched version with relocations. But that doesn't
explain it, as first time succeeds. Pretty sure it's some pytgon package,
will try lion in the desert to find it
On Sun, Oct 17, 2021, 14:09 Jeff Zhang <zj...@gmail.com> wrote:
> It looks like sc in your environment is SparkSession, but it should be
> SparkContext, Do you use official apache spark distribution ?
>
> ---------------------------------------------------------------------------
> AttributeError Traceback (most recent call last)
> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
> by: admin') AttributeError: 'SparkSession' object has no attribute
> 'setJobGroup'
> Fail to setJobGroup
>
> Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午7:07写道:
>
>> I figured, but any idea how I can troubleshoot it?
>> Probably some python package installed...
>>
>> On Sun, Oct 17, 2021, 14:02 Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> Hi Lior,
>>>
>>> I run it many times, but can not reproduce it.
>>>
>>> [image: image.png]
>>>
>>> Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午5:05写道:
>>>
>>>> Hi,
>>>> I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
>>>> The driver has Anaconda with python 3.6.5 installed.
>>>>
>>>> Running pyspark paragraph, I'm getting some weird behavior. Paragraph
>>>> runs successfully on first attempt, but then failing on successive attempts
>>>> (until interpreter restarted).
>>>>
>>>> Error is:
>>>>
>>>>
>>>> ---------------------------------------------------------------------------
>>>> AttributeError Traceback (most recent call last)
>>>> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
>>>> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
>>>> by: admin') AttributeError: 'SparkSession' object has no attribute
>>>> 'setJobGroup'
>>>> Fail to setJobGroup
>>>>
>>>>
>>>>
>>>> For instance, this paragraph:
>>>>
>>>> %spark.pyspark
>>>>
>>>> import pandas as pd
>>>> from pyspark.sql.types import StringType
>>>> import numpy as np
>>>> def np_sqrt(v):
>>>> return np.__path__
>>>>
>>>> spark.udf.register("np_sqrt", np_sqrt, StringType())
>>>>
>>>> df = spark.range(10).createOrReplaceTempView("d")
>>>> spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
>>>>
>>>>
>>>> BTW, I can run different pyspark paragraphs, each will succeed at the
>>>> first attempt. Once re-running a paragraph, every pyspark paragraph will
>>>> fail.
>>>>
>>>> Any idea what may cause it?
>>>> Thanks,
>>>> Lior
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>
Re: Error in pyspark
Posted by Jeff Zhang <zj...@gmail.com>.
It looks like sc in your environment is SparkSession, but it should be
SparkContext, Do you use official apache spark distribution ?
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started by:
admin') AttributeError: 'SparkSession' object has no attribute 'setJobGroup'
Fail to setJobGroup
Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午7:07写道:
> I figured, but any idea how I can troubleshoot it?
> Probably some python package installed...
>
> On Sun, Oct 17, 2021, 14:02 Jeff Zhang <zj...@gmail.com> wrote:
>
>> Hi Lior,
>>
>> I run it many times, but can not reproduce it.
>>
>> [image: image.png]
>>
>> Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午5:05写道:
>>
>>> Hi,
>>> I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
>>> The driver has Anaconda with python 3.6.5 installed.
>>>
>>> Running pyspark paragraph, I'm getting some weird behavior. Paragraph
>>> runs successfully on first attempt, but then failing on successive attempts
>>> (until interpreter restarted).
>>>
>>> Error is:
>>>
>>>
>>> ---------------------------------------------------------------------------
>>> AttributeError Traceback (most recent call last)
>>> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
>>> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
>>> by: admin') AttributeError: 'SparkSession' object has no attribute
>>> 'setJobGroup'
>>> Fail to setJobGroup
>>>
>>>
>>>
>>> For instance, this paragraph:
>>>
>>> %spark.pyspark
>>>
>>> import pandas as pd
>>> from pyspark.sql.types import StringType
>>> import numpy as np
>>> def np_sqrt(v):
>>> return np.__path__
>>>
>>> spark.udf.register("np_sqrt", np_sqrt, StringType())
>>>
>>> df = spark.range(10).createOrReplaceTempView("d")
>>> spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
>>>
>>>
>>> BTW, I can run different pyspark paragraphs, each will succeed at the
>>> first attempt. Once re-running a paragraph, every pyspark paragraph will
>>> fail.
>>>
>>> Any idea what may cause it?
>>> Thanks,
>>> Lior
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
--
Best Regards
Jeff Zhang
Re: Error in pyspark
Posted by Lior Chaga <li...@taboola.com>.
I figured, but any idea how I can troubleshoot it?
Probably some python package installed...
On Sun, Oct 17, 2021, 14:02 Jeff Zhang <zj...@gmail.com> wrote:
> Hi Lior,
>
> I run it many times, but can not reproduce it.
>
> [image: image.png]
>
> Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午5:05写道:
>
>> Hi,
>> I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
>> The driver has Anaconda with python 3.6.5 installed.
>>
>> Running pyspark paragraph, I'm getting some weird behavior. Paragraph
>> runs successfully on first attempt, but then failing on successive attempts
>> (until interpreter restarted).
>>
>> Error is:
>>
>>
>> ---------------------------------------------------------------------------
>> AttributeError Traceback (most recent call last)
>> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
>> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
>> by: admin') AttributeError: 'SparkSession' object has no attribute
>> 'setJobGroup'
>> Fail to setJobGroup
>>
>>
>>
>> For instance, this paragraph:
>>
>> %spark.pyspark
>>
>> import pandas as pd
>> from pyspark.sql.types import StringType
>> import numpy as np
>> def np_sqrt(v):
>> return np.__path__
>>
>> spark.udf.register("np_sqrt", np_sqrt, StringType())
>>
>> df = spark.range(10).createOrReplaceTempView("d")
>> spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
>>
>>
>> BTW, I can run different pyspark paragraphs, each will succeed at the
>> first attempt. Once re-running a paragraph, every pyspark paragraph will
>> fail.
>>
>> Any idea what may cause it?
>> Thanks,
>> Lior
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
Re: Error in pyspark
Posted by Jeff Zhang <zj...@gmail.com>.
Hi Lior,
I run it many times, but can not reproduce it.
[image: image.png]
Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午5:05写道:
> Hi,
> I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
> The driver has Anaconda with python 3.6.5 installed.
>
> Running pyspark paragraph, I'm getting some weird behavior. Paragraph runs
> successfully on first attempt, but then failing on successive attempts
> (until interpreter restarted).
>
> Error is:
>
> ---------------------------------------------------------------------------
> AttributeError Traceback (most recent call last)
> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
> by: admin') AttributeError: 'SparkSession' object has no attribute
> 'setJobGroup'
> Fail to setJobGroup
>
>
>
> For instance, this paragraph:
>
> %spark.pyspark
>
> import pandas as pd
> from pyspark.sql.types import StringType
> import numpy as np
> def np_sqrt(v):
> return np.__path__
>
> spark.udf.register("np_sqrt", np_sqrt, StringType())
>
> df = spark.range(10).createOrReplaceTempView("d")
> spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
>
>
> BTW, I can run different pyspark paragraphs, each will succeed at the
> first attempt. Once re-running a paragraph, every pyspark paragraph will
> fail.
>
> Any idea what may cause it?
> Thanks,
> Lior
>
--
Best Regards
Jeff Zhang