You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Lior Chaga <li...@taboola.com> on 2021/10/17 09:04:54 UTC

Error in pyspark

Hi,
I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
The driver has Anaconda with python 3.6.5 installed.

Running pyspark paragraph, I'm getting some weird behavior. Paragraph runs
successfully on first attempt, but then failing on successive attempts
(until interpreter restarted).

Error is:

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started by:
admin') AttributeError: 'SparkSession' object has no attribute 'setJobGroup'
Fail to setJobGroup



For instance, this paragraph:

%spark.pyspark

import pandas as pd
from pyspark.sql.types import StringType
import numpy as np
def np_sqrt(v):
    return np.__path__

spark.udf.register("np_sqrt", np_sqrt, StringType())

df = spark.range(10).createOrReplaceTempView("d")
spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)


BTW, I can run different pyspark paragraphs, each will succeed at the first
attempt. Once re-running a paragraph, every pyspark paragraph will fail.

Any idea what may cause it?
Thanks,
Lior

Re: Error in pyspark

Posted by Lior Chaga <li...@taboola.com>.
I have a slightly patched version with relocations. But that doesn't
explain it, as first time succeeds. Pretty sure it's some pytgon package,
will try lion in the desert to find it

On Sun, Oct 17, 2021, 14:09 Jeff Zhang <zj...@gmail.com> wrote:

> It looks like sc in your environment is SparkSession, but it should be
> SparkContext, Do you use official apache spark distribution ?
>
> ---------------------------------------------------------------------------
> AttributeError Traceback (most recent call last)
> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
> by: admin') AttributeError: 'SparkSession' object has no attribute
> 'setJobGroup'
> Fail to setJobGroup
>
> Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午7:07写道:
>
>> I figured, but any idea how I can troubleshoot it?
>> Probably some python package installed...
>>
>> On Sun, Oct 17, 2021, 14:02 Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> Hi Lior,
>>>
>>> I run it many times, but can not reproduce it.
>>>
>>> [image: image.png]
>>>
>>> Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午5:05写道:
>>>
>>>> Hi,
>>>> I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
>>>> The driver has Anaconda with python 3.6.5 installed.
>>>>
>>>> Running pyspark paragraph, I'm getting some weird behavior. Paragraph
>>>> runs successfully on first attempt, but then failing on successive attempts
>>>> (until interpreter restarted).
>>>>
>>>> Error is:
>>>>
>>>>
>>>> ---------------------------------------------------------------------------
>>>> AttributeError Traceback (most recent call last)
>>>> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
>>>> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
>>>> by: admin') AttributeError: 'SparkSession' object has no attribute
>>>> 'setJobGroup'
>>>> Fail to setJobGroup
>>>>
>>>>
>>>>
>>>> For instance, this paragraph:
>>>>
>>>> %spark.pyspark
>>>>
>>>> import pandas as pd
>>>> from pyspark.sql.types import StringType
>>>> import numpy as np
>>>> def np_sqrt(v):
>>>>     return np.__path__
>>>>
>>>> spark.udf.register("np_sqrt", np_sqrt, StringType())
>>>>
>>>> df = spark.range(10).createOrReplaceTempView("d")
>>>> spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
>>>>
>>>>
>>>> BTW, I can run different pyspark paragraphs, each will succeed at the
>>>> first attempt. Once re-running a paragraph, every pyspark paragraph will
>>>> fail.
>>>>
>>>> Any idea what may cause it?
>>>> Thanks,
>>>> Lior
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Error in pyspark

Posted by Jeff Zhang <zj...@gmail.com>.
It looks like sc in your environment is SparkSession, but it should be
SparkContext, Do you use official apache spark distribution ?

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started by:
admin') AttributeError: 'SparkSession' object has no attribute 'setJobGroup'
Fail to setJobGroup

Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午7:07写道:

> I figured, but any idea how I can troubleshoot it?
> Probably some python package installed...
>
> On Sun, Oct 17, 2021, 14:02 Jeff Zhang <zj...@gmail.com> wrote:
>
>> Hi Lior,
>>
>> I run it many times, but can not reproduce it.
>>
>> [image: image.png]
>>
>> Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午5:05写道:
>>
>>> Hi,
>>> I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
>>> The driver has Anaconda with python 3.6.5 installed.
>>>
>>> Running pyspark paragraph, I'm getting some weird behavior. Paragraph
>>> runs successfully on first attempt, but then failing on successive attempts
>>> (until interpreter restarted).
>>>
>>> Error is:
>>>
>>>
>>> ---------------------------------------------------------------------------
>>> AttributeError Traceback (most recent call last)
>>> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
>>> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
>>> by: admin') AttributeError: 'SparkSession' object has no attribute
>>> 'setJobGroup'
>>> Fail to setJobGroup
>>>
>>>
>>>
>>> For instance, this paragraph:
>>>
>>> %spark.pyspark
>>>
>>> import pandas as pd
>>> from pyspark.sql.types import StringType
>>> import numpy as np
>>> def np_sqrt(v):
>>>     return np.__path__
>>>
>>> spark.udf.register("np_sqrt", np_sqrt, StringType())
>>>
>>> df = spark.range(10).createOrReplaceTempView("d")
>>> spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
>>>
>>>
>>> BTW, I can run different pyspark paragraphs, each will succeed at the
>>> first attempt. Once re-running a paragraph, every pyspark paragraph will
>>> fail.
>>>
>>> Any idea what may cause it?
>>> Thanks,
>>> Lior
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

-- 
Best Regards

Jeff Zhang

Re: Error in pyspark

Posted by Lior Chaga <li...@taboola.com>.
I figured, but any idea how I can troubleshoot it?
Probably some python package installed...

On Sun, Oct 17, 2021, 14:02 Jeff Zhang <zj...@gmail.com> wrote:

> Hi Lior,
>
> I run it many times, but can not reproduce it.
>
> [image: image.png]
>
> Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午5:05写道:
>
>> Hi,
>> I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
>> The driver has Anaconda with python 3.6.5 installed.
>>
>> Running pyspark paragraph, I'm getting some weird behavior. Paragraph
>> runs successfully on first attempt, but then failing on successive attempts
>> (until interpreter restarted).
>>
>> Error is:
>>
>>
>> ---------------------------------------------------------------------------
>> AttributeError Traceback (most recent call last)
>> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
>> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
>> by: admin') AttributeError: 'SparkSession' object has no attribute
>> 'setJobGroup'
>> Fail to setJobGroup
>>
>>
>>
>> For instance, this paragraph:
>>
>> %spark.pyspark
>>
>> import pandas as pd
>> from pyspark.sql.types import StringType
>> import numpy as np
>> def np_sqrt(v):
>>     return np.__path__
>>
>> spark.udf.register("np_sqrt", np_sqrt, StringType())
>>
>> df = spark.range(10).createOrReplaceTempView("d")
>> spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
>>
>>
>> BTW, I can run different pyspark paragraphs, each will succeed at the
>> first attempt. Once re-running a paragraph, every pyspark paragraph will
>> fail.
>>
>> Any idea what may cause it?
>> Thanks,
>> Lior
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Error in pyspark

Posted by Jeff Zhang <zj...@gmail.com>.
Hi Lior,

I run it many times, but can not reproduce it.

[image: image.png]

Lior Chaga <li...@taboola.com> 于2021年10月17日周日 下午5:05写道:

> Hi,
> I'm puzzled - I have zeppelin 0.10 with spark 3.1.2.
> The driver has Anaconda with python 3.6.5 installed.
>
> Running pyspark paragraph, I'm getting some weird behavior. Paragraph runs
> successfully on first attempt, but then failing on successive attempts
> (until interpreter restarted).
>
> Error is:
>
> ---------------------------------------------------------------------------
> AttributeError Traceback (most recent call last)
> <ipython-input-13-4fe403202586> in <module>() ----> 1 sc.setJobGroup(
> 'zeppelin|admin|2GAWSQT96|paragraph_1629208863735_110330802', 'Started
> by: admin') AttributeError: 'SparkSession' object has no attribute
> 'setJobGroup'
> Fail to setJobGroup
>
>
>
> For instance, this paragraph:
>
> %spark.pyspark
>
> import pandas as pd
> from pyspark.sql.types import StringType
> import numpy as np
> def np_sqrt(v):
>     return np.__path__
>
> spark.udf.register("np_sqrt", np_sqrt, StringType())
>
> df = spark.range(10).createOrReplaceTempView("d")
> spark.sql("select np_sqrt(id) as arr from d").show(truncate=False)
>
>
> BTW, I can run different pyspark paragraphs, each will succeed at the
> first attempt. Once re-running a paragraph, every pyspark paragraph will
> fail.
>
> Any idea what may cause it?
> Thanks,
> Lior
>


-- 
Best Regards

Jeff Zhang