You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/11 23:34:14 UTC

[GitHub] [arrow] wjones127 commented on a change in pull request #12112: ARROW-15183: [Python][Docs] Add Missing Dataset Write Options

wjones127 commented on a change in pull request #12112:
URL: https://github.com/apache/arrow/pull/12112#discussion_r782597375



##########
File path: docs/source/python/dataset.rst
##########
@@ -699,3 +699,46 @@ Parquet files:
 
     # also clean-up custom base directory used in some examples
     shutil.rmtree(str(base), ignore_errors=True)
+
+
+Configuring files open during a write

Review comment:
       I think these sections would actually be best placed next to the "Partitioning performance considerations" section at https://github.com/apache/arrow/blame/master/docs/source/python/dataset.rst#L578
   
   

##########
File path: docs/source/python/dataset.rst
##########
@@ -699,3 +699,46 @@ Parquet files:
 
     # also clean-up custom base directory used in some examples
     shutil.rmtree(str(base), ignore_errors=True)
+
+
+Configuring files open during a write
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When writing data to the disk, there are a few parameters that can be 
+important to optimize the writes, i.e number of rows per file and
+number of files open during write. 
+
+The number of files opened at during the write time can be set as follows;
+
+.. ipython:: python
+
+    ds.write_dataset(data=table, base_dir="data_dir", max_open_files=max_open_files)
+
+The maximum number of rows per file can be set as follows;
+
+.. ipython:: pythoin
+    ds.write_dataset(record_batch, "data_dir", format="parquet",

Review comment:
       I think that space between the directive and code is required.
   
   ```suggestion
   .. ipython:: python
   
       ds.write_dataset(record_batch, "data_dir", format="parquet",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org