You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/26 20:40:26 UTC

[GitHub] [beam] robertwb opened a new issue, #22923: [Feature Request]: Allow customization of filename and sharding for dataframe IOs.

robertwb opened a new issue, #22923:
URL: https://github.com/apache/beam/issues/22923

   ### What would you like to happen?
   
   Other sinks, such as TextIO and FileSink, allow this customization. 
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: dsl-dataframe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] robertwb commented on issue #22923: [Feature Request]: Allow customization of filename and sharding for dataframe IOs.

Posted by GitBox <gi...@apache.org>.

robertwb commented on issue #22923:
URL: https://github.com/apache/beam/issues/22923#issuecomment-1228940709

   Context: https://stackoverflow.com/questions/73498119/apache-beam-dataframe-write-csv-to-gcs-without-shard-name-template


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Feature Request]: Allow customization of filename and sharding for dataframe IOs. [beam]

Posted by "jzxu (via GitHub)" <gi...@apache.org>.

jzxu commented on issue #22923:
URL: https://github.com/apache/beam/issues/22923#issuecomment-1833308759

   Hi, I noticed that despite https://github.com/apache/beam/pull/22925 being merged, DeferredDataFrame.to_csv() still doesn't respect the num_shards argument. Minimal test case:
   
   ```
   from typing import NamedTuple
   import apache_beam as beam
   from apache_beam.dataframe import convert
   
   class Row(NamedTuple):
     x: int
   
   with beam.Pipeline('DirectRunner') as p:
     c = (p | beam.Create([Row(x=i) for i in range(1000000)]))
     df = convert.to_dataframe(c)
     df.to_csv('/tmp/apache_beam_test.csv', index=False, num_shards=2)
   ```
   
   Running this with apache_beam 2.50.0 results in a single shard being written.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org