You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/27 14:03:00 UTC

[GitHub] [arrow-cookbook] amol- opened a new pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

amol- opened a new pull request #50:
URL: https://github.com/apache/arrow-cookbook/pull/50


   Addresses https://github.com/apache/arrow-cookbook/pull/49


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] westonpace merged pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

Posted by GitBox <gi...@apache.org>.

westonpace merged pull request #50:
URL: https://github.com/apache/arrow-cookbook/pull/50


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] amol- commented on pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

Posted by GitBox <gi...@apache.org>.

amol- commented on pull request #50:
URL: https://github.com/apache/arrow-cookbook/pull/50#issuecomment-908215190


   @westonpace  @jorisvandenbossche I merged your suggestions, I think this should now be ready to go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] westonpace commented on a change in pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

Posted by GitBox <gi...@apache.org>.

westonpace commented on a change in pull request #50:
URL: https://github.com/apache/arrow-cookbook/pull/50#discussion_r697731958



##########
File path: python/source/io.rst
##########
@@ -180,13 +177,36 @@ Writing CSV files
 =================
 
 It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+a CSV file using the :func:`pyarrow.csv.write_csv` function
 
 .. testcode::
 
+    arr = pa.array(range(100))
     table = pa.Table.from_arrays([arr], names=["col1"])
-    table.to_pandas().to_csv("table.csv", index=False)
+    
+    import pyarrow.csv
+    pa.csv.write_csv(table, "table.csv",
+                     write_options=pa.csv.WriteOptions(include_header=True))

Review comment:
       There has been some movement here: https://github.com/apache/arrow-cookbook/pull/2 to avoid `testsetup` (which is hidden) in favor of fully standalone `testcode` blocks (at the risk of duplication).

##########
File path: python/source/io.rst
##########
@@ -180,13 +177,36 @@ Writing CSV files
 =================
 
 It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+a CSV file using the :func:`pyarrow.csv.write_csv` function
 
 .. testcode::
 
+    arr = pa.array(range(100))
     table = pa.Table.from_arrays([arr], names=["col1"])
-    table.to_pandas().to_csv("table.csv", index=False)
+    
+    import pyarrow.csv
+    pa.csv.write_csv(table, "table.csv",
+                     write_options=pa.csv.WriteOptions(include_header=True))
+
+Writing CSV files incrementally
+===============================
+
+If you need to append to write data to a CSV file incrementally
+as you generate or retrieve the data and you don't want to keep
+in memory the whole table to write it at once, it's possible to use
+:class:`pyarrow.csv.CSVWriter` to write data incrementally
+
+.. testcode::
+
+    schema = pa.schema([("col1", pa.int32())])
+    with pa.csv.CSVWriter("table.csv", schema=schema) as writer:
+        for chunk in range(10):
+            datachunk = range(chunk*10, (chunk+1)*10)
+            table = pa.Table.from_arrays([pa.array(datachunk)], schema=schema)
+            writer.write(table)
+
+Apart tables, it's equally possible to write :class:`pyarrow.RecordBatch`
+just passing them as you would for tables.

Review comment:
       ```suggestion
   It's equally possible to write :class:`pyarrow.RecordBatch`
   by passing them as you would for tables.
   ```

##########
File path: python/source/io.rst
##########
@@ -180,13 +177,36 @@ Writing CSV files
 =================
 
 It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+a CSV file using the :func:`pyarrow.csv.write_csv` function
 
 .. testcode::
 
+    arr = pa.array(range(100))
     table = pa.Table.from_arrays([arr], names=["col1"])
-    table.to_pandas().to_csv("table.csv", index=False)
+    
+    import pyarrow.csv
+    pa.csv.write_csv(table, "table.csv",
+                     write_options=pa.csv.WriteOptions(include_header=True))
+
+Writing CSV files incrementally
+===============================
+
+If you need to append to write data to a CSV file incrementally

Review comment:
       ```suggestion
   If you need to write data to a CSV file incrementally
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] jorisvandenbossche commented on a change in pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

Posted by GitBox <gi...@apache.org>.

jorisvandenbossche commented on a change in pull request #50:
URL: https://github.com/apache/arrow-cookbook/pull/50#discussion_r698297153



##########
File path: python/source/io.rst
##########
@@ -180,13 +177,36 @@ Writing CSV files
 =================
 
 It is currently possible to write an Arrow :class:`pyarrow.Table` to

Review comment:
       ```suggestion
   It is possible to write an Arrow :class:`pyarrow.Table` to
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] westonpace merged pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

Posted by GitBox <gi...@apache.org>.

westonpace merged pull request #50:
URL: https://github.com/apache/arrow-cookbook/pull/50


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] amol- commented on a change in pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

Posted by GitBox <gi...@apache.org>.

amol- commented on a change in pull request #50:
URL: https://github.com/apache/arrow-cookbook/pull/50#discussion_r698302432



##########
File path: python/source/io.rst
##########
@@ -180,13 +177,36 @@ Writing CSV files
 =================
 
 It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+a CSV file using the :func:`pyarrow.csv.write_csv` function
 
 .. testcode::
 
+    arr = pa.array(range(100))
     table = pa.Table.from_arrays([arr], names=["col1"])
-    table.to_pandas().to_csv("table.csv", index=False)
+    
+    import pyarrow.csv
+    pa.csv.write_csv(table, "table.csv",
+                     write_options=pa.csv.WriteOptions(include_header=True))

Review comment:
       I saw, that's why I moved the `arr = pa.array(range(100))` into the test code, so that it's more explicit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org