You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by gates ma <ga...@gmail.com> on 2021/08/26 02:59:28 UTC

[python] - csv streaming into gcs

hi folks,

Looking to use the csv read stream to write to GCS. Is there an ability to
use pyarrow cvs stream to write to a GCS bucket ?

Thanks,
MG.

Re: [python] - csv streaming into gcs

Posted by Micah Kornfield <em...@gmail.com>.
Just a note some types are not supported for writing to CSV (only those
that can currently be cast to String via kernel's are supported).

On Thu, Aug 26, 2021 at 9:33 PM Leonhard Gruenschloss <
leonhard@gruenschloss.org> wrote:

> Note that GCS also has an S3 compatibility layer:
> https://cloud.google.com/storage/docs/migrating
>
> On Fri, Aug 27, 2021 at 9:44 AM Weston Pace <we...@gmail.com> wrote:
>
>> First you will need a filesystem that can read & write to GCS.  There
>> is no native GCS filesystem (yet, see [1]) at the moment so you will
>> need to use fsspec to wrap an fsspec compatible GCS filesystem.  There
>> is an example of how to do this at [2].
>>
>> To open a CSV read stream you can either create a dataset with the CSV
>> file format (see [3] to learn about datasets) or you can create an
>> incremental CSV reader using open_csv[4] and an incremental CSV writer
>> using CSVWriter[5].  More general CSV reading/writing information can
>> be found at [6].
>>
>> [1] https://issues.apache.org/jira/browse/ARROW-1231
>> [2]
>> https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems
>> [3] https://arrow.apache.org/docs/python/dataset.html#tabular-datasets
>> [4]
>> https://arrow.apache.org/docs/python/generated/pyarrow.csv.open_csv.html#pyarrow.csv.open_csv
>> [5]
>> https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html#pyarrow.csv.CSVWriter
>> [6]
>> https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html#pyarrow.csv.CSVWriter
>>
>> On Wed, Aug 25, 2021 at 4:59 PM gates ma <ga...@gmail.com> wrote:
>> >
>> > hi folks,
>> >
>> > Looking to use the csv read stream to write to GCS. Is there an ability
>> to use pyarrow cvs stream to write to a GCS bucket ?
>> >
>> > Thanks,
>> > MG.
>>
>

Re: [python] - csv streaming into gcs

Posted by Leonhard Gruenschloss <le...@gruenschloss.org>.
Note that GCS also has an S3 compatibility layer:
https://cloud.google.com/storage/docs/migrating

On Fri, Aug 27, 2021 at 9:44 AM Weston Pace <we...@gmail.com> wrote:

> First you will need a filesystem that can read & write to GCS.  There
> is no native GCS filesystem (yet, see [1]) at the moment so you will
> need to use fsspec to wrap an fsspec compatible GCS filesystem.  There
> is an example of how to do this at [2].
>
> To open a CSV read stream you can either create a dataset with the CSV
> file format (see [3] to learn about datasets) or you can create an
> incremental CSV reader using open_csv[4] and an incremental CSV writer
> using CSVWriter[5].  More general CSV reading/writing information can
> be found at [6].
>
> [1] https://issues.apache.org/jira/browse/ARROW-1231
> [2]
> https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems
> [3] https://arrow.apache.org/docs/python/dataset.html#tabular-datasets
> [4]
> https://arrow.apache.org/docs/python/generated/pyarrow.csv.open_csv.html#pyarrow.csv.open_csv
> [5]
> https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html#pyarrow.csv.CSVWriter
> [6]
> https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html#pyarrow.csv.CSVWriter
>
> On Wed, Aug 25, 2021 at 4:59 PM gates ma <ga...@gmail.com> wrote:
> >
> > hi folks,
> >
> > Looking to use the csv read stream to write to GCS. Is there an ability
> to use pyarrow cvs stream to write to a GCS bucket ?
> >
> > Thanks,
> > MG.
>

Re: [python] - csv streaming into gcs

Posted by Weston Pace <we...@gmail.com>.
First you will need a filesystem that can read & write to GCS.  There
is no native GCS filesystem (yet, see [1]) at the moment so you will
need to use fsspec to wrap an fsspec compatible GCS filesystem.  There
is an example of how to do this at [2].

To open a CSV read stream you can either create a dataset with the CSV
file format (see [3] to learn about datasets) or you can create an
incremental CSV reader using open_csv[4] and an incremental CSV writer
using CSVWriter[5].  More general CSV reading/writing information can
be found at [6].

[1] https://issues.apache.org/jira/browse/ARROW-1231
[2] https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems
[3] https://arrow.apache.org/docs/python/dataset.html#tabular-datasets
[4] https://arrow.apache.org/docs/python/generated/pyarrow.csv.open_csv.html#pyarrow.csv.open_csv
[5] https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html#pyarrow.csv.CSVWriter
[6] https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html#pyarrow.csv.CSVWriter

On Wed, Aug 25, 2021 at 4:59 PM gates ma <ga...@gmail.com> wrote:
>
> hi folks,
>
> Looking to use the csv read stream to write to GCS. Is there an ability to use pyarrow cvs stream to write to a GCS bucket ?
>
> Thanks,
> MG.