You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/13 10:11:16 UTC

[GitHub] [arrow] HaykManukyanAvetiky opened a new issue #12413: Pyarrow write dataset ignores delimiter

HaykManukyanAvetiky opened a new issue #12413:
URL: https://github.com/apache/arrow/issues/12413


   Hi Guys.
   I tried to report this bug/possible bug with jira but I failed so writing here.
   I have a dataset and when I am trying to write it as tsv or tab separated file pyarrow  anyway writes csv.
   here is my code :
   ```python
   ds.write_dataset(data=table, base_dir='adapter/tsv/',
                    basename_template='my-unique-name-{i}.tsv', 
                    format=ds.CsvFileFormat(parse_options=csv.ParseOptions(delimiter="\t")), 
                    partitioning=['month'],
                    existing_data_behavior='overwrite_or_ignore' )
   ```
   here is what I am getting 
   ```csv 
   "day","year"
   26,1958
   11,1912
   26,1942
   ```
   here is what I should get : 
   
   ```tsv
   day     year
   26      1958
   11      1912
   26      1942
   ```
   IT feels like pyarrow ignoring format or delimiter
   Thanks in advance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] HaykManukyanAvetiky commented on issue #12413: Pyarrow write dataset ignores delimiter

Posted by GitBox <gi...@apache.org>.
HaykManukyanAvetiky commented on issue #12413:
URL: https://github.com/apache/arrow/issues/12413#issuecomment-1039915015


   ok thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] HaykManukyanAvetiky closed issue #12413: Pyarrow write dataset ignores delimiter

Posted by GitBox <gi...@apache.org>.
HaykManukyanAvetiky closed issue #12413:
URL: https://github.com/apache/arrow/issues/12413


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on issue #12413: Pyarrow write dataset ignores delimiter

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #12413:
URL: https://github.com/apache/arrow/issues/12413#issuecomment-1038302606


   Unfortunately ParseOptions only applies to reading data; [WriteOptions](https://github.com/apache/arrow/blob/6b7c7a2702466f7c3c9c1f9dd41bc42458cff398/cpp/src/arrow/csv/options.h#L187) controls writing data. We don't currently support changing the delimiter when writing.
   
   Please see https://issues.apache.org/jira/browse/ARROW-15672. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on issue #12413: Pyarrow write dataset ignores delimiter

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #12413:
URL: https://github.com/apache/arrow/issues/12413#issuecomment-1038302606


   Unfortunately ParseOptions only applies to reading data; [WriteOptions](https://github.com/apache/arrow/blob/6b7c7a2702466f7c3c9c1f9dd41bc42458cff398/cpp/src/arrow/csv/options.h#L187) controls writing data. We don't currently support changing the delimiter when writing.
   
   Please see https://issues.apache.org/jira/browse/ARROW-15672. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] HaykManukyanAvetiky closed issue #12413: Pyarrow write dataset ignores delimiter

Posted by GitBox <gi...@apache.org>.
HaykManukyanAvetiky closed issue #12413:
URL: https://github.com/apache/arrow/issues/12413


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] HaykManukyanAvetiky commented on issue #12413: Pyarrow write dataset ignores delimiter

Posted by GitBox <gi...@apache.org>.
HaykManukyanAvetiky commented on issue #12413:
URL: https://github.com/apache/arrow/issues/12413#issuecomment-1039915015


   ok thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org