You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/25 09:08:28 UTC
[GitHub] [arrow] xubinlaile opened a new issue #8270: object of type 'init64' is not JSON serializable
xubinlaile opened a new issue #8270:
URL: https://github.com/apache/arrow/issues/8270
i use
pa.RecordBatch.from_pandas(df) met an error ,as the picture
show
![20200925170402](https://user-images.githubusercontent.com/43530705/94248742-75158580-ff51-11ea-8675-dbda2de01413.jpg)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699011960
> Thanks for the report!
>
> Can you provide a copy-pastable example so we can reproduce the issue locally and investigate? See [here](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) for info on what's needed from our side.
i copy parts of the example then run in my machine. code is here : arrow/python/examples/plasma/sorting/
answers in
https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable:
import json
import numpy as np
class NpEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return super(NpEncoder, self).default(obj)
# Your codes ....
json.dumps(data, cls=NpEncoder)
may work
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] arw2019 commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
arw2019 commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699010092
> may this help:
> https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
Maybe I didn't read it carefully but that does not look like a problem with Pandas.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] dianaclarke commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
dianaclarke commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-705453481
@xubinlaile A fix for this is in master and will be included in the Arrow 2.0.0 release which is being cut tomorrow (Oct 9, 2020).
https://github.com/apache/arrow/commit/b2842ab2eb0d7a7a633049a5591e1eaa254d4446
Thanks for taking the time to report this bug! ❤️
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] dianaclarke commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
dianaclarke commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-705453481
@xubinlaile A fix for this is in master and will be included in the Arrow 2.0.0 release which is being cut tomorrow (Oct 9, 2020).
https://github.com/apache/arrow/commit/b2842ab2eb0d7a7a633049a5591e1eaa254d4446
Thanks for taking the time to report this bug! ❤️
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699021063
> > may this help:
> > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
>
> Maybe I didn't read it carefully but that does not look like a problem with Arrow.
with this anwser , i modify pandas_compat.py. it works.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699003981
maybe this help:
class NpEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return super(NpEncoder, self).default(obj)
json.dumps(data, cls=NpEncoder)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] arw2019 commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
arw2019 commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699006533
Thanks for the report!
Can you provide a copy-pastable example so we can reproduce the issue locally and investigate? See [here](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) for info on what's needed from our side.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] xubinlaile removed a comment on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
xubinlaile removed a comment on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699019657
> > may this help:
> > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
>
> Maybe I didn't read it carefully but that does not look like a problem with Pandas.
> > may this help:
> > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
>
> Maybe I didn't read it carefully but that does not look like a problem with Pandas.
code as follows. with the stackoverflow anwsers , i modify pandas_compat.py ,it work .
from multiprocessing import Pool
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.plasma as plasma
import subprocess
import time
client = None
object_store_size = 2 * 10 ** 9 # 2 GB
num_cores = 8
num_rows = 200000
num_cols = 2
column_names = [str(i) for i in range(num_cols)]
column_to_sort = column_names[0]
# Connect to clients
def connect():
global client
client = plasma.connect('/tmp/store')
np.random.seed(int(time.time() * 10e7) % 10000000)
def put_df(df):
record_batch = pa.RecordBatch.from_pandas(df)
# Get size of record batch and schema
mock_sink = pa.MockOutputStream()
stream_writer = pa.RecordBatchStreamWriter(mock_sink, record_batch.schema)
stream_writer.write_batch(record_batch)
data_size = mock_sink.size()
# Generate an ID and allocate a buffer in the object store for the
# serialized DataFrame
object_id = plasma.ObjectID(np.random.bytes(20))
buf = client.create(object_id, data_size)
# Write the serialized DataFrame to the object store
sink = pa.FixedSizeBufferWriter(buf)
stream_writer = pa.RecordBatchStreamWriter(sink, record_batch.schema)
stream_writer.write_batch(record_batch)
# Seal the object
client.seal(object_id)
return object_id
def get_dfs(object_ids):
"""Retrieve dataframes from the object store given their object IDs."""
buffers = client.get_buffers(object_ids)
return [pa.RecordBatchStreamReader(buf).read_next_batch().to_pandas()
for buf in buffers]
if __name__ == '__main__':
# Start the plasma store.
p = subprocess.Popen(['plasma_store',
'-s', '/tmp/store',
'-m', str(object_store_size)])
# Connect to the plasma store.
connect()
# Connect the processes in the pool.
pool = Pool(initializer=connect, initargs=(), processes=num_cores)
# Create a DataFrame from a numpy array.
df = pd.DataFrame(np.random.randn(num_rows, num_cols),
columns=column_names)
partition_ids = [put_df(partition) for partition
in np.split(df, num_cores)]
print(partition_ids)
df = get_dfs(partition_ids)
print(df)
# Kill the object store.
p.kill()
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
wesm commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-701551859
I was able to reproduce the issue
https://issues.apache.org/jira/browse/ARROW-10147
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
wesm commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-700278601
This seems like it could be a bug in pyarrow, if we could get a minimal reproduction of the issue (i.e. an example pandas.DataFrame that triggers the error) that would be helpful in opening a Jira issue
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] arw2019 edited a comment on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
arw2019 edited a comment on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699010092
> may this help:
> https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
Maybe I didn't read it carefully but that does not look like a problem with Arrow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] xubinlaile removed a comment on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
xubinlaile removed a comment on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699009005
may this help:
https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699019657
> > may this help:
> > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
>
> Maybe I didn't read it carefully but that does not look like a problem with Pandas.
> > may this help:
> > https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
>
> Maybe I didn't read it carefully but that does not look like a problem with Pandas.
code as follows. with the stackoverflow anwsers , i modify pandas_compat.py ,it work .
from multiprocessing import Pool
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.plasma as plasma
import subprocess
import time
client = None
object_store_size = 2 * 10 ** 9 # 2 GB
num_cores = 8
num_rows = 200000
num_cols = 2
column_names = [str(i) for i in range(num_cols)]
column_to_sort = column_names[0]
# Connect to clients
def connect():
global client
client = plasma.connect('/tmp/store')
np.random.seed(int(time.time() * 10e7) % 10000000)
def put_df(df):
record_batch = pa.RecordBatch.from_pandas(df)
# Get size of record batch and schema
mock_sink = pa.MockOutputStream()
stream_writer = pa.RecordBatchStreamWriter(mock_sink, record_batch.schema)
stream_writer.write_batch(record_batch)
data_size = mock_sink.size()
# Generate an ID and allocate a buffer in the object store for the
# serialized DataFrame
object_id = plasma.ObjectID(np.random.bytes(20))
buf = client.create(object_id, data_size)
# Write the serialized DataFrame to the object store
sink = pa.FixedSizeBufferWriter(buf)
stream_writer = pa.RecordBatchStreamWriter(sink, record_batch.schema)
stream_writer.write_batch(record_batch)
# Seal the object
client.seal(object_id)
return object_id
def get_dfs(object_ids):
"""Retrieve dataframes from the object store given their object IDs."""
buffers = client.get_buffers(object_ids)
return [pa.RecordBatchStreamReader(buf).read_next_batch().to_pandas()
for buf in buffers]
if __name__ == '__main__':
# Start the plasma store.
p = subprocess.Popen(['plasma_store',
'-s', '/tmp/store',
'-m', str(object_store_size)])
# Connect to the plasma store.
connect()
# Connect the processes in the pool.
pool = Pool(initializer=connect, initargs=(), processes=num_cores)
# Create a DataFrame from a numpy array.
df = pd.DataFrame(np.random.randn(num_rows, num_cols),
columns=column_names)
partition_ids = [put_df(partition) for partition
in np.split(df, num_cores)]
print(partition_ids)
df = get_dfs(partition_ids)
print(df)
# Kill the object store.
p.kill()
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm closed issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
wesm closed issue #8270:
URL: https://github.com/apache/arrow/issues/8270
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] xubinlaile commented on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
xubinlaile commented on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699009005
may this help:
https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] xubinlaile removed a comment on issue #8270: object of type 'init64' is not JSON serializable
Posted by GitBox <gi...@apache.org>.
xubinlaile removed a comment on issue #8270:
URL: https://github.com/apache/arrow/issues/8270#issuecomment-699003981
maybe this help:
class NpEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return super(NpEncoder, self).default(obj)
json.dumps(data, cls=NpEncoder)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org