You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bhaarat Sharma <bh...@gmail.com> on 2016/07/30 05:24:37 UTC

PySpark 1.6.1: 'builtin_function_or_method' object has no attribute '__code__' in Pickles

I am using PySpark 1.6.1. In my python program I'm using ctypes and trying
to load the liblept library via the liblept.so.4.0.2 file on my system.

While trying to load the library via cdll.LoadLibrary("liblept.so.4.0.2") I
get an error : 'builtin_function_or_method' object has no attribute
'__code__'

Here are my files

test.py

from ctypes import *

class FooBar:
        def __init__(self, options=None, **kwargs):
                if options is not None:
                        self.options = options

        def read_image_from_bytes(self, bytes):
                return "img"

        def text_from_image(self, img):
                self.leptonica = cdll.LoadLibrary("liblept.so.4.0.2")
                return "test from foobar"


spark.py

from pyspark import SparkContext
import test
import numpy as np
sc = SparkContext("local", "test")
foo = test.FooBar()

def file_bytes(rawdata):
        return np.asarray(bytearray(rawdata),dtype=np.uint8)

def do_some_with_bytes(bytes):
        return foo.do_something_on_image(foo.read_image_from_bytes(bytes))

images = sc.binaryFiles("/myimages/*.jpg")
image_to_text = lambda rawdata: do_some_with_bytes(file_bytes(rawdata))
print images.values().map(image_to_text).take(1) #this gives an error


What is the way to load this library?

RE: PySpark 1.6.1: 'builtin_function_or_method' object has no attribute '__code__' in Pickles

Posted by Joaquin Alzola <Jo...@lebara.com>.
An example (adding a package to the spark submit):
bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 spark_v3.py


From: Bhaarat Sharma [mailto:bhaarat.s@gmail.com]
Sent: 30 July 2016 06:38
To: ayan guha <gu...@gmail.com>
Cc: user <us...@spark.apache.org>
Subject: Re: PySpark 1.6.1: 'builtin_function_or_method' object has no attribute '__code__' in Pickles

I'm very new to Spark. Im running it on a single CentOS7 box. How would I add a test.py to spark submit? Point to any resources would be great. Thanks for your help.

On Sat, Jul 30, 2016 at 1:28 AM, ayan guha <gu...@gmail.com>> wrote:
I think you need to add test.py in spark submit so that it gets shipped to all executors

On Sat, Jul 30, 2016 at 3:24 PM, Bhaarat Sharma <bh...@gmail.com>> wrote:
I am using PySpark 1.6.1. In my python program I'm using ctypes and trying to load the liblept library via the liblept.so.4.0.2 file on my system.

While trying to load the library via cdll.LoadLibrary("liblept.so.4.0.2") I get an error : 'builtin_function_or_method' object has no attribute '__code__'

Here are my files

test.py


from ctypes import *



class FooBar:

        def __init__(self, options=None, **kwargs):

                if options is not None:

                        self.options = options



        def read_image_from_bytes(self, bytes):

                return "img"



        def text_from_image(self, img):

                self.leptonica = cdll.LoadLibrary("liblept.so.4.0.2")

                return "test from foobar"



spark.py

from pyspark import SparkContext

import test

import numpy as np

sc = SparkContext("local", "test")

foo = test.FooBar()



def file_bytes(rawdata):

        return np.asarray(bytearray(rawdata),dtype=np.uint8)



def do_some_with_bytes(bytes):

        return foo.do_something_on_image(foo.read_image_from_bytes(bytes))



images = sc.binaryFiles("/myimages/*.jpg")

image_to_text = lambda rawdata: do_some_with_bytes(file_bytes(rawdata))

print images.values().map(image_to_text).take(1) #this gives an error



What is the way to load this library?



--
Best Regards,
Ayan Guha

This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

Re: PySpark 1.6.1: 'builtin_function_or_method' object has no attribute '__code__' in Pickles

Posted by ayan guha <gu...@gmail.com>.
Hi

Glad that your problem is resolved. spark-submit is the recommended way of
submitting application (Pyspark internally does spark-submit)

Yes, the process remains same single node vs multiple node. However, I
would suggest to use any of the cluster mode instead of the local mode. In
single node, you can start up standalone master.

I would suggest to go through deployment section in Spark documentation.

Best
Ayan

On Sat, Jul 30, 2016 at 4:32 PM, Bhaarat Sharma <bh...@gmail.com> wrote:

> That worked perfectly, Thank You! Had to make few modifications to my
> script but nothing major.
>
> What is the difference in my running this via "pyspark myscript.py" vs.
> "spark-submit myscript.py --py-files dependency.py" ? Is it that the
> dependency is on all executors with the latter?
>
> Additionally, I'm currently running spark on a single box. If I had a 10
> node cluster, would the process be the same? On a 10 node cluster will the
> processing of my job split across the nodes? I should also add that the
> bulk of the processing work is being done in dependency.py.
>
> I would appreciate any resources relevant to these questions.
>
> Thanks again.
>
>
>
> On Sat, Jul 30, 2016 at 1:42 AM, Bhaarat Sharma <bh...@gmail.com>
> wrote:
>
>> Great, let me give that a shot.
>>
>> On Sat, Jul 30, 2016 at 1:40 AM, ayan guha <gu...@gmail.com> wrote:
>>
>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>>
>>> For Python, you can use the --py-files argument of spark-submit to add
>>> .py, .zip or .egg files to be distributed with your application. If you
>>> depend on multiple Python files we recommend packaging them into a .zip
>>>  or .egg.
>>>
>>>
>>>
>>> On Sat, Jul 30, 2016 at 3:37 PM, Bhaarat Sharma <bh...@gmail.com>
>>> wrote:
>>>
>>>> I'm very new to Spark. Im running it on a single CentOS7 box. How would
>>>> I add a test.py to spark submit? Point to any resources would be great.
>>>> Thanks for your help.
>>>>
>>>> On Sat, Jul 30, 2016 at 1:28 AM, ayan guha <gu...@gmail.com> wrote:
>>>>
>>>>> I think you need to add test.py in spark submit so that it gets
>>>>> shipped to all executors
>>>>>
>>>>> On Sat, Jul 30, 2016 at 3:24 PM, Bhaarat Sharma <bh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am using PySpark 1.6.1. In my python program I'm using ctypes and
>>>>>> trying to load the liblept library via the liblept.so.4.0.2 file on my
>>>>>> system.
>>>>>>
>>>>>> While trying to load the library via
>>>>>> cdll.LoadLibrary("liblept.so.4.0.2") I get an error
>>>>>> : 'builtin_function_or_method' object has no attribute '__code__'
>>>>>>
>>>>>> Here are my files
>>>>>>
>>>>>> test.py
>>>>>>
>>>>>> from ctypes import *
>>>>>>
>>>>>> class FooBar:
>>>>>>         def __init__(self, options=None, **kwargs):
>>>>>>                 if options is not None:
>>>>>>                         self.options = options
>>>>>>
>>>>>>         def read_image_from_bytes(self, bytes):
>>>>>>                 return "img"
>>>>>>
>>>>>>         def text_from_image(self, img):
>>>>>>                 self.leptonica = cdll.LoadLibrary("liblept.so.4.0.2")
>>>>>>                 return "test from foobar"
>>>>>>
>>>>>>
>>>>>> spark.py
>>>>>>
>>>>>> from pyspark import SparkContext
>>>>>> import test
>>>>>> import numpy as np
>>>>>> sc = SparkContext("local", "test")
>>>>>> foo = test.FooBar()
>>>>>>
>>>>>> def file_bytes(rawdata):
>>>>>>         return np.asarray(bytearray(rawdata),dtype=np.uint8)
>>>>>>
>>>>>> def do_some_with_bytes(bytes):
>>>>>>         return foo.do_something_on_image(foo.read_image_from_bytes(bytes))
>>>>>>
>>>>>> images = sc.binaryFiles("/myimages/*.jpg")
>>>>>> image_to_text = lambda rawdata: do_some_with_bytes(file_bytes(rawdata))
>>>>>> print images.values().map(image_to_text).take(1) #this gives an error
>>>>>>
>>>>>>
>>>>>> What is the way to load this library?
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Ayan Guha
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>


-- 
Best Regards,
Ayan Guha

Re: PySpark 1.6.1: 'builtin_function_or_method' object has no attribute '__code__' in Pickles

Posted by Bhaarat Sharma <bh...@gmail.com>.
I'm very new to Spark. Im running it on a single CentOS7 box. How would I
add a test.py to spark submit? Point to any resources would be great.
Thanks for your help.

On Sat, Jul 30, 2016 at 1:28 AM, ayan guha <gu...@gmail.com> wrote:

> I think you need to add test.py in spark submit so that it gets shipped to
> all executors
>
> On Sat, Jul 30, 2016 at 3:24 PM, Bhaarat Sharma <bh...@gmail.com>
> wrote:
>
>> I am using PySpark 1.6.1. In my python program I'm using ctypes and
>> trying to load the liblept library via the liblept.so.4.0.2 file on my
>> system.
>>
>> While trying to load the library via cdll.LoadLibrary("liblept.so.4.0.2")
>> I get an error : 'builtin_function_or_method' object has no attribute
>> '__code__'
>>
>> Here are my files
>>
>> test.py
>>
>> from ctypes import *
>>
>> class FooBar:
>>         def __init__(self, options=None, **kwargs):
>>                 if options is not None:
>>                         self.options = options
>>
>>         def read_image_from_bytes(self, bytes):
>>                 return "img"
>>
>>         def text_from_image(self, img):
>>                 self.leptonica = cdll.LoadLibrary("liblept.so.4.0.2")
>>                 return "test from foobar"
>>
>>
>> spark.py
>>
>> from pyspark import SparkContext
>> import test
>> import numpy as np
>> sc = SparkContext("local", "test")
>> foo = test.FooBar()
>>
>> def file_bytes(rawdata):
>>         return np.asarray(bytearray(rawdata),dtype=np.uint8)
>>
>> def do_some_with_bytes(bytes):
>>         return foo.do_something_on_image(foo.read_image_from_bytes(bytes))
>>
>> images = sc.binaryFiles("/myimages/*.jpg")
>> image_to_text = lambda rawdata: do_some_with_bytes(file_bytes(rawdata))
>> print images.values().map(image_to_text).take(1) #this gives an error
>>
>>
>> What is the way to load this library?
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: PySpark 1.6.1: 'builtin_function_or_method' object has no attribute '__code__' in Pickles

Posted by ayan guha <gu...@gmail.com>.
I think you need to add test.py in spark submit so that it gets shipped to
all executors

On Sat, Jul 30, 2016 at 3:24 PM, Bhaarat Sharma <bh...@gmail.com> wrote:

> I am using PySpark 1.6.1. In my python program I'm using ctypes and trying
> to load the liblept library via the liblept.so.4.0.2 file on my system.
>
> While trying to load the library via cdll.LoadLibrary("liblept.so.4.0.2")
> I get an error : 'builtin_function_or_method' object has no attribute
> '__code__'
>
> Here are my files
>
> test.py
>
> from ctypes import *
>
> class FooBar:
>         def __init__(self, options=None, **kwargs):
>                 if options is not None:
>                         self.options = options
>
>         def read_image_from_bytes(self, bytes):
>                 return "img"
>
>         def text_from_image(self, img):
>                 self.leptonica = cdll.LoadLibrary("liblept.so.4.0.2")
>                 return "test from foobar"
>
>
> spark.py
>
> from pyspark import SparkContext
> import test
> import numpy as np
> sc = SparkContext("local", "test")
> foo = test.FooBar()
>
> def file_bytes(rawdata):
>         return np.asarray(bytearray(rawdata),dtype=np.uint8)
>
> def do_some_with_bytes(bytes):
>         return foo.do_something_on_image(foo.read_image_from_bytes(bytes))
>
> images = sc.binaryFiles("/myimages/*.jpg")
> image_to_text = lambda rawdata: do_some_with_bytes(file_bytes(rawdata))
> print images.values().map(image_to_text).take(1) #this gives an error
>
>
> What is the way to load this library?
>
>


-- 
Best Regards,
Ayan Guha