You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/25 09:08:28 UTC

[GitHub] [arrow] xubinlaile opened a new issue #8270: object of type 'init64' is not JSON serializable

xubinlaile opened a new issue #8270:
URL: https://github.com/apache/arrow/issues/8270


   
   i use 
   pa.RecordBatch.from_pandas(df) met an error ,as the picture
   show
   
   
   
   ![20200925170402](https://user-images.githubusercontent.com/43530705/94248742-75158580-ff51-11ea-8675-dbda2de01413.jpg)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699011960


   > Thanks for the report!
   > 
   > Can you provide a copy-pastable example so we can reproduce the issue locally and investigate? See [here](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) for info on what's needed from our side.
   
   i copy parts of the example then run in my  machine.       code is here :   arrow/python/examples/plasma/sorting/
   answers in 
   https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable:
   
   import json
   import numpy as np
   
   class NpEncoder(json.JSONEncoder):
       def default(self, obj):
           if isinstance(obj, np.integer):
               return int(obj)
           elif isinstance(obj, np.floating):
               return float(obj)
           elif isinstance(obj, np.ndarray):
               return obj.tolist()
           else:
               return super(NpEncoder, self).default(obj)
   
   # Your codes .... 
   json.dumps(data, cls=NpEncoder)
   
   may work


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
arw2019 commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699010092


   > may this help:
   > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
   
   Maybe I didn't read it carefully but that does not look like a problem with Pandas.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dianaclarke commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
dianaclarke commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-705453481


   @xubinlaile A fix for this is in master and will be included in the Arrow 2.0.0 release which is being cut tomorrow (Oct 9, 2020).
   
   https://github.com/apache/arrow/commit/b2842ab2eb0d7a7a633049a5591e1eaa254d4446
   
   Thanks for taking the time to report this bug! ❤️ 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dianaclarke commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
dianaclarke commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-705453481


   @xubinlaile A fix for this is in master and will be included in the Arrow 2.0.0 release which is being cut tomorrow (Oct 9, 2020).
   
   https://github.com/apache/arrow/commit/b2842ab2eb0d7a7a633049a5591e1eaa254d4446
   
   Thanks for taking the time to report this bug! ❤️ 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699021063


   > > may this help:
   > > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
   > 
   > Maybe I didn't read it carefully but that does not look like a problem with Arrow.
   
   with this anwser , i modify pandas_compat.py. it works.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699003981


   
   maybe this help:
   
   
   class NpEncoder(json.JSONEncoder):
       def default(self, obj):
           if isinstance(obj, np.integer):
               return int(obj)
           elif isinstance(obj, np.floating):
               return float(obj)
           elif isinstance(obj, np.ndarray):
               return obj.tolist()
           else:
               return super(NpEncoder, self).default(obj)
   
   json.dumps(data, cls=NpEncoder) 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
arw2019 commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699006533


   Thanks for the report!
   
   Can you provide a copy-pastable example so we can reproduce the issue locally and investigate? See [here](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) for info on what's needed from our side.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xubinlaile removed a comment on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
xubinlaile removed a comment on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699019657


   > > may this help:
   > > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
   > 
   > Maybe I didn't read it carefully but that does not look like a problem with Pandas.
   
   
   
   > > may this help:
   > > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
   > 
   > Maybe I didn't read it carefully but that does not look like a problem with Pandas.
   code as follows. with the stackoverflow anwsers , i  modify  pandas_compat.py ,it work .
   
   from multiprocessing import Pool
   import numpy as np
   import pandas as pd
   import pyarrow as pa
   import pyarrow.plasma as plasma
   import subprocess
   import time
   
   
   client = None
   object_store_size = 2 * 10 ** 9  # 2 GB
   num_cores = 8
   num_rows = 200000
   num_cols = 2
   column_names = [str(i) for i in range(num_cols)]
   column_to_sort = column_names[0]
   
   
   # Connect to clients
   def connect():
       global client
       client = plasma.connect('/tmp/store')
       np.random.seed(int(time.time() * 10e7) % 10000000)
   
   
   def put_df(df):
       record_batch = pa.RecordBatch.from_pandas(df)
   
       # Get size of record batch and schema
       mock_sink = pa.MockOutputStream()
       stream_writer = pa.RecordBatchStreamWriter(mock_sink, record_batch.schema)
       stream_writer.write_batch(record_batch)
       data_size = mock_sink.size()
   
       # Generate an ID and allocate a buffer in the object store for the
       # serialized DataFrame
       object_id = plasma.ObjectID(np.random.bytes(20))
       buf = client.create(object_id, data_size)
   
       # Write the serialized DataFrame to the object store
       sink = pa.FixedSizeBufferWriter(buf)
       stream_writer = pa.RecordBatchStreamWriter(sink, record_batch.schema)
       stream_writer.write_batch(record_batch)
   
       # Seal the object
       client.seal(object_id)
   
       return object_id
   
   
   def get_dfs(object_ids):
       """Retrieve dataframes from the object store given their object IDs."""
       buffers = client.get_buffers(object_ids)
       return [pa.RecordBatchStreamReader(buf).read_next_batch().to_pandas()
               for buf in buffers]
   
   
   
   
   if __name__ == '__main__':
       # Start the plasma store.
       p = subprocess.Popen(['plasma_store',
                             '-s', '/tmp/store',
                             '-m', str(object_store_size)])
   
       # Connect to the plasma store.
       connect()
   
       # Connect the processes in the pool.
       pool = Pool(initializer=connect, initargs=(), processes=num_cores)
   
       # Create a DataFrame from a numpy array.
       df = pd.DataFrame(np.random.randn(num_rows, num_cols),
                         columns=column_names)
   
       partition_ids = [put_df(partition) for partition
                        in np.split(df, num_cores)]
   
       print(partition_ids)
       df = get_dfs(partition_ids)
       print(df)
       # Kill the object store.
       p.kill()
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-701551859


   I was able to reproduce the issue
   
   https://issues.apache.org/jira/browse/ARROW-10147


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-700278601


   This seems like it could be a bug in pyarrow, if we could get a minimal reproduction of the issue (i.e. an example pandas.DataFrame that triggers the error) that would be helpful in opening a Jira issue


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 edited a comment on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
arw2019 edited a comment on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699010092


   > may this help:
   > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
   
   Maybe I didn't read it carefully but that does not look like a problem with Arrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xubinlaile removed a comment on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
xubinlaile removed a comment on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699009005


   may this help:
   https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699019657


   > > may this help:
   > > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
   > 
   > Maybe I didn't read it carefully but that does not look like a problem with Pandas.
   
   
   
   > > may this help:
   > > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
   > 
   > Maybe I didn't read it carefully but that does not look like a problem with Pandas.
   code as follows. with the stackoverflow anwsers , i  modify  pandas_compat.py ,it work .
   
   from multiprocessing import Pool
   import numpy as np
   import pandas as pd
   import pyarrow as pa
   import pyarrow.plasma as plasma
   import subprocess
   import time
   
   
   client = None
   object_store_size = 2 * 10 ** 9  # 2 GB
   num_cores = 8
   num_rows = 200000
   num_cols = 2
   column_names = [str(i) for i in range(num_cols)]
   column_to_sort = column_names[0]
   
   
   # Connect to clients
   def connect():
       global client
       client = plasma.connect('/tmp/store')
       np.random.seed(int(time.time() * 10e7) % 10000000)
   
   
   def put_df(df):
       record_batch = pa.RecordBatch.from_pandas(df)
   
       # Get size of record batch and schema
       mock_sink = pa.MockOutputStream()
       stream_writer = pa.RecordBatchStreamWriter(mock_sink, record_batch.schema)
       stream_writer.write_batch(record_batch)
       data_size = mock_sink.size()
   
       # Generate an ID and allocate a buffer in the object store for the
       # serialized DataFrame
       object_id = plasma.ObjectID(np.random.bytes(20))
       buf = client.create(object_id, data_size)
   
       # Write the serialized DataFrame to the object store
       sink = pa.FixedSizeBufferWriter(buf)
       stream_writer = pa.RecordBatchStreamWriter(sink, record_batch.schema)
       stream_writer.write_batch(record_batch)
   
       # Seal the object
       client.seal(object_id)
   
       return object_id
   
   
   def get_dfs(object_ids):
       """Retrieve dataframes from the object store given their object IDs."""
       buffers = client.get_buffers(object_ids)
       return [pa.RecordBatchStreamReader(buf).read_next_batch().to_pandas()
               for buf in buffers]
   
   
   
   
   if __name__ == '__main__':
       # Start the plasma store.
       p = subprocess.Popen(['plasma_store',
                             '-s', '/tmp/store',
                             '-m', str(object_store_size)])
   
       # Connect to the plasma store.
       connect()
   
       # Connect the processes in the pool.
       pool = Pool(initializer=connect, initargs=(), processes=num_cores)
   
       # Create a DataFrame from a numpy array.
       df = pd.DataFrame(np.random.randn(num_rows, num_cols),
                         columns=column_names)
   
       partition_ids = [put_df(partition) for partition
                        in np.split(df, num_cores)]
   
       print(partition_ids)
       df = get_dfs(partition_ids)
       print(df)
       # Kill the object store.
       p.kill()
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
wesm closed issue #8270:
URL: https://github.com/apache/arrow/issues/8270


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699009005


   may this help:
   https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xubinlaile removed a comment on issue #8270: object of type 'init64' is not JSON serializable

Posted by GitBox <gi...@apache.org>.
xubinlaile removed a comment on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699003981


   
   maybe this help:
   
   
   class NpEncoder(json.JSONEncoder):
       def default(self, obj):
           if isinstance(obj, np.integer):
               return int(obj)
           elif isinstance(obj, np.floating):
               return float(obj)
           elif isinstance(obj, np.ndarray):
               return obj.tolist()
           else:
               return super(NpEncoder, self).default(obj)
   
   json.dumps(data, cls=NpEncoder) 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org