You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/21 04:58:01 UTC

[GitHub] [arrow] nipnipj opened a new issue, #13667: csv to parquet

nipnipj opened a new issue, #13667:
URL: https://github.com/apache/arrow/issues/13667

   When trying 
   ```
   csv <- open_dataset("data.scv", format = "csv")
   csv %>% write_dataset("train_data", format = "parquet")
   ```
   the following error message appears
   ```
   Error: Invalid: In CSV column #172: Row #408: CSV conversion error to null: invalid value '0.002861340479143'
   Traceback:
   
   1. csv %>% write_dataset("train_data", format = "parquet")
   2. write_dataset(., "train_data", format = "parquet")
   3. plan$Write(final_node, prepare_key_value_metadata(output_schema$metadata), 
    .     options, path_and_fs$fs, path_and_fs$path, partitioning, 
    .     basename_template, existing_data_behavior, max_partitions, 
    .     max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group)
   4. ExecPlan_Write(self, node, ...)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] eitsupi commented on issue #13667: csv to parquet

Posted by GitBox <gi...@apache.org>.

eitsupi commented on issue #13667:
URL: https://github.com/apache/arrow/issues/13667#issuecomment-1191699859

   How about trying the following?
   
   ```r
   csv <- readr::read_csv("data.csv")
   csv %>% write_dataset("train_data", format = "parquet")
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nipnipj commented on issue #13667: csv to parquet

Posted by GitBox <gi...@apache.org>.

nipnipj commented on issue #13667:
URL: https://github.com/apache/arrow/issues/13667#issuecomment-1191723211

   That's one way, but Is there a way to modify `csv$schema`? I mean column 172 only, not all of it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] eitsupi commented on issue #13667: csv to parquet

Posted by GitBox <gi...@apache.org>.

eitsupi commented on issue #13667:
URL: https://github.com/apache/arrow/issues/13667#issuecomment-1192649474

   You can use the `col_types` option of the `arrow::read_csv_arrow()` function.
   Like this.
   
   ```r
   mtcars |> readr::write_csv("test.csv")
   ds <- arrow::open_dataset("test.csv", format = "csv", col_types = arrow::schema("cyl" = arrow::utf8()))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nipnipj closed issue #13667: csv to parquet

Posted by GitBox <gi...@apache.org>.

nipnipj closed issue #13667: csv to parquet
URL: https://github.com/apache/arrow/issues/13667


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org