You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/26 20:40:26 UTC
[GitHub] [beam] robertwb opened a new issue, #22923: [Feature Request]: Allow customization of filename and sharding for dataframe IOs.
robertwb opened a new issue, #22923:
URL: https://github.com/apache/beam/issues/22923
### What would you like to happen?
Other sinks, such as TextIO and FileSink, allow this customization.
### Issue Priority
Priority: 2
### Issue Component
Component: dsl-dataframe
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] robertwb commented on issue #22923: [Feature Request]: Allow customization of filename and sharding for dataframe IOs.
Posted by GitBox <gi...@apache.org>.
robertwb commented on issue #22923:
URL: https://github.com/apache/beam/issues/22923#issuecomment-1228940709
Context: https://stackoverflow.com/questions/73498119/apache-beam-dataframe-write-csv-to-gcs-without-shard-name-template
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Feature Request]: Allow customization of filename and sharding for dataframe IOs. [beam]
Posted by "jzxu (via GitHub)" <gi...@apache.org>.
jzxu commented on issue #22923:
URL: https://github.com/apache/beam/issues/22923#issuecomment-1833308759
Hi, I noticed that despite https://github.com/apache/beam/pull/22925 being merged, DeferredDataFrame.to_csv() still doesn't respect the num_shards argument. Minimal test case:
```
from typing import NamedTuple
import apache_beam as beam
from apache_beam.dataframe import convert
class Row(NamedTuple):
x: int
with beam.Pipeline('DirectRunner') as p:
c = (p | beam.Create([Row(x=i) for i in range(1000000)]))
df = convert.to_dataframe(c)
df.to_csv('/tmp/apache_beam_test.csv', index=False, num_shards=2)
```
Running this with apache_beam 2.50.0 results in a single shard being written.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org