You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/21 04:58:01 UTC
[GitHub] [arrow] nipnipj opened a new issue, #13667: csv to parquet
nipnipj opened a new issue, #13667:
URL: https://github.com/apache/arrow/issues/13667
When trying
```
csv <- open_dataset("data.scv", format = "csv")
csv %>% write_dataset("train_data", format = "parquet")
```
the following error message appears
```
Error: Invalid: In CSV column #172: Row #408: CSV conversion error to null: invalid value '0.002861340479143'
Traceback:
1. csv %>% write_dataset("train_data", format = "parquet")
2. write_dataset(., "train_data", format = "parquet")
3. plan$Write(final_node, prepare_key_value_metadata(output_schema$metadata),
. options, path_and_fs$fs, path_and_fs$path, partitioning,
. basename_template, existing_data_behavior, max_partitions,
. max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group)
4. ExecPlan_Write(self, node, ...)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] eitsupi commented on issue #13667: csv to parquet
Posted by GitBox <gi...@apache.org>.
eitsupi commented on issue #13667:
URL: https://github.com/apache/arrow/issues/13667#issuecomment-1191699859
How about trying the following?
```r
csv <- readr::read_csv("data.csv")
csv %>% write_dataset("train_data", format = "parquet")
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nipnipj commented on issue #13667: csv to parquet
Posted by GitBox <gi...@apache.org>.
nipnipj commented on issue #13667:
URL: https://github.com/apache/arrow/issues/13667#issuecomment-1191723211
That's one way, but Is there a way to modify `csv$schema`? I mean column 172 only, not all of it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] eitsupi commented on issue #13667: csv to parquet
Posted by GitBox <gi...@apache.org>.
eitsupi commented on issue #13667:
URL: https://github.com/apache/arrow/issues/13667#issuecomment-1192649474
You can use the `col_types` option of the `arrow::read_csv_arrow()` function.
Like this.
```r
mtcars |> readr::write_csv("test.csv")
ds <- arrow::open_dataset("test.csv", format = "csv", col_types = arrow::schema("cyl" = arrow::utf8()))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nipnipj closed issue #13667: csv to parquet
Posted by GitBox <gi...@apache.org>.
nipnipj closed issue #13667: csv to parquet
URL: https://github.com/apache/arrow/issues/13667
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org