You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Lee, David" <Da...@blackrock.com> on 2019/11/06 18:22:09 UTC

Saving Binary Arrow memory objects as blobs in Cassandra

Is there anyway to save Arrow memory as a blob? I tried using Feather and Parquet, but neither one supports writing complex nested structures yet.

I tried with the following test file.

test.jsonl:
{"a": 1, "b": "abc", "c": [1, 2], "d": {"e": true, "f": "1991-02-03"}, "g": [{"h": 1, "i": "a"}, {"h": 2, "i": "b"}]}
{"a": 2, "b": "xyz", "c": [3, 4], "d": {"e": false, "f": "2010-01-15"}, "g": [{"h": 3, "i": "c"}, {"h": 2, "i": "d"}]}

code:
import pyarrow.json as json
arrow_mem = json.read_json("test.jsonl")

Trying something out..

Storing Arrow Data in Cassandra for fast retrieval with primary keys.
Solr indexing the Arrow Data blob for Cassandra retrieval by primary key.

This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.

Re: Saving Binary Arrow memory objects as blobs in Cassandra

Posted by Wes McKinney <we...@gmail.com>.
I suggest you use the IPC protocol

http://arrow.apache.org/docs/python/ipc.html

This protocol will be considered stable starting with the 1.0.0
release but I would guess (without making any guarantees) that blobs
written with 0.15.1 will be readable in 1.0.0 and beyond.

On Wed, Nov 6, 2019 at 12:22 PM Lee, David <Da...@blackrock.com> wrote:
>
> Is there anyway to save Arrow memory as a blob? I tried using Feather and Parquet, but neither one supports writing complex nested structures yet.
>
> I tried with the following test file.
>
> test.jsonl:
> {"a": 1, "b": "abc", "c": [1, 2], "d": {"e": true, "f": "1991-02-03"}, "g": [{"h": 1, "i": "a"}, {"h": 2, "i": "b"}]}
> {"a": 2, "b": "xyz", "c": [3, 4], "d": {"e": false, "f": "2010-01-15"}, "g": [{"h": 3, "i": "c"}, {"h": 2, "i": "d"}]}
>
> code:
> import pyarrow.json as json
> arrow_mem = json.read_json("test.jsonl")
>
> Trying something out..
>
> Storing Arrow Data in Cassandra for fast retrieval with primary keys.
> Solr indexing the Arrow Data blob for Cassandra retrieval by primary key.
>
> This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.
> For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.
>
> © 2019 BlackRock, Inc. All rights reserved.