You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by James Duong via user <us...@arrow.apache.org> on 2022/12/07 16:19:37 UTC

Python IPC: JSON output files

Hi,

Is there a way to write JSON files from Arrow Tables? I see other formats
such as CSV and Parquet supported for both, but only reading for JSON.

-- 

*James Duong*
Lead Software Developer
Bit Quill Technologies Inc.
Direct: +1.604.562.6082 | jamesd@bitquilltech.com
https://www.bitquilltech.com

This email message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure, or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and destroy
all copies of the original message.  Thank you.

Re: Python IPC: JSON output files

Posted by "James Duong (CW)" <jd...@dremio.com>.
Thanks for the suggestions.

The data is coming from Flight. We found the process of using read_pandas()
and then converting the data to JSON to be an area we might be able to
improve on.
Getting the data into an Arrow Table (calling read_all() on the
FlightStreamReader) took about as long as read_pandas().

On Wed, Dec 7, 2022 at 3:36 PM Jacob Quinn <qu...@gmail.com> wrote:

> The Julia implementation (Arrow.jl) has the ability to output an
> Arrow.Table as a JSON array of objects or object of arrays via the
> JSONTables.jl package, like:
>
> using Arrow, JSONTables
>
> julia> t = Arrow.Table("tbl.arrow")
> Arrow.Table with 4 rows, 1 columns, and schema:
>  :a  String
>
> julia> arraytable(t)
> "[{\"a\":null},{\"a\":\"a\"},{\"a\":\"b\"},{\"a\":\"a\"}]"
>
> julia> objecttable(t)
> "{\"a\":[null,\"a\",\"b\",\"a\"]}"
>
>
> You can also provide an IO argument as the 1st argument to
> arraytable/objecttable to write the results out to a file directly.
>
> In case that helps at all!
>
> -Jacob
>
> On Wed, Dec 7, 2022 at 9:20 AM James Duong via user <us...@arrow.apache.org>
> wrote:
>
>> Hi,
>>
>> Is there a way to write JSON files from Arrow Tables? I see other formats
>> such as CSV and Parquet supported for both, but only reading for JSON.
>>
>> --
>>
>> *James Duong*
>> Lead Software Developer
>> Bit Quill Technologies Inc.
>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>> https://www.bitquilltech.com
>>
>> This email message is for the sole use of the intended recipient(s) and
>> may contain confidential and privileged information.  Any unauthorized
>> review, use, disclosure, or distribution is prohibited.  If you are not the
>> intended recipient, please contact the sender by reply email and destroy
>> all copies of the original message.  Thank you.
>>
>

Re: Python IPC: JSON output files

Posted by Jacob Quinn <qu...@gmail.com>.
The Julia implementation (Arrow.jl) has the ability to output an
Arrow.Table as a JSON array of objects or object of arrays via the
JSONTables.jl package, like:

using Arrow, JSONTables

julia> t = Arrow.Table("tbl.arrow")
Arrow.Table with 4 rows, 1 columns, and schema:
 :a  String

julia> arraytable(t)
"[{\"a\":null},{\"a\":\"a\"},{\"a\":\"b\"},{\"a\":\"a\"}]"

julia> objecttable(t)
"{\"a\":[null,\"a\",\"b\",\"a\"]}"


You can also provide an IO argument as the 1st argument to
arraytable/objecttable to write the results out to a file directly.

In case that helps at all!

-Jacob

On Wed, Dec 7, 2022 at 9:20 AM James Duong via user <us...@arrow.apache.org>
wrote:

> Hi,
>
> Is there a way to write JSON files from Arrow Tables? I see other formats
> such as CSV and Parquet supported for both, but only reading for JSON.
>
> --
>
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information.  Any unauthorized
> review, use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>

RE: Python IPC: JSON output files

Posted by "Lee, David" <Da...@blackrock.com>.
I’ve done it before using pa.Table.to_pylist() and then iterating through the array, calling json.dumps() while writing the output to a file.

This would produce a jsonl formatted file.


From: David Li <li...@apache.org>
Sent: Wednesday, December 7, 2022 8:59 AM
To: dl <us...@arrow.apache.org>
Subject: Re: Python IPC: JSON output files


External Email: Use caution with links and attachments
Assuming you're looking at C++, it's unfortunately not implemented: https://issues.apache.org/jira/browse/ARROW-5033<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/ARROW-5033__;!!KSjYCgUGsB4!Y0i_KZGuJC1g2hrE4UsHfBLQCxO-qC6prFfYzHYwj5HBlLyenbDsAZ6qzMJkuAoHj0OPYMvRYp82n9b18C7T$>

On Wed, Dec 7, 2022, at 11:19, James Duong via user wrote:
Hi,

Is there a way to write JSON files from Arrow Tables? I see other formats such as CSV and Parquet supported for both, but only reading for JSON.

--
James Duong
Lead Software Developer
Bit Quill Technologies Inc.
Direct: +1.604.562.6082 | jamesd@bitquilltech.com<ma...@bitquilltech.com>
https://www.bitquilltech.com<https://urldefense.com/v3/__https:/www.bitquilltech.com/__;!!KSjYCgUGsB4!Y0i_KZGuJC1g2hrE4UsHfBLQCxO-qC6prFfYzHYwj5HBlLyenbDsAZ6qzMJkuAoHj0OPYMvRYp82nyIlpbyH$>


This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.


For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2022 BlackRock, Inc. All rights reserved.

Re: Python IPC: JSON output files

Posted by David Li <li...@apache.org>.
Assuming you're looking at C++, it's unfortunately not implemented: https://issues.apache.org/jira/browse/ARROW-5033

On Wed, Dec 7, 2022, at 11:19, James Duong via user wrote:
> Hi,
> 
> Is there a way to write JSON files from Arrow Tables? I see other formats such as CSV and Parquet supported for both, but only reading for JSON.
> 
> -- 
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
> 
> 
> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.