You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2021/10/01 12:57:00 UTC

[jira] [Updated] (ARROW-13887) [R] Capture error produced when reading in CSV file with headers and using a schema, and add suggestion

     [ https://issues.apache.org/jira/browse/ARROW-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicola Crane updated ARROW-13887:
---------------------------------
    Description: 
When reading in a CSV with headers, and also using a schema, we get an error as the code tries to read in the header as a line of data.
{code:java}
share_data <- tibble::tibble(
  company = c("AMZN", "GOOG", "BKNG", "TSLA"),
  price = c(3463.12, 2884.38, 2300.46, 732.39)
)

readr::write_csv(share_data, file = "share_data.csv")

share_schema <- schema(
  company = utf8(),
  price = float64()
)

read_csv_arrow("share_data.csv", schema = share_schema)

{code}
{code:java}
Error: Invalid: In CSV column #1: CSV conversion error to double: invalid value 'price'
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:492 decoder_.Decode(data, size, quoted, &value)
/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:496 parser.VisitColumn(col_index, visit) {code}
The correct thing here would have been for the user to supply the argument {{skip=1}} to {{read_csv_arrow()}} but this is not immediately obvious from the error message returned from C++.  We should capture the error and instead supply our own error message using {{rlang::abort}} which informs the user of the error and then suggests what they can do to prevent it.

 

For similar examples (and their associated PRs) see {color:#1d1c1d}ARROW-11766, and ARROW-12791{color}

  was:
When reading in a CSV with headers, and also using a schema, we get an error as the code tries to read in the header as a line of data.
{code:java}
share_data <- tibble::tibble(
  company = c("AMZN", "GOOG", "BKNG", "TSLA"),
  price = c(3463.12, 2884.38, 2300.46, 732.39),
  date = rep(as.Date("2021-09-03"), 4)
)

readr::write_csv(share_data, file = "share_data.csv")

share_schema <- schema(
  company = utf8(),
  price = float64(),
  date = date32()
)

read_csv_arrow("share_data.csv", schema = share_schema)

{code}
{code:java}
Error: Invalid: In CSV column #1: CSV conversion error to double: invalid value 'price'
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:492 decoder_.Decode(data, size, quoted, &value)
/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:496 parser.VisitColumn(col_index, visit) {code}
The correct thing here would have been for the user to supply the argument {{skip=1}} to {{read_csv_arrow()}} but this is not immediately obvious from the error message returned from C++.  We should capture the error and instead supply our own error message using {{rlang::abort}} which informs the user of the error and then suggests what they can do to prevent it.

 

For similar examples (and their associated PRs) see {color:#1d1c1d}ARROW-11766, and ARROW-12791{color}


> [R] Capture error produced when reading in CSV file with headers and using a schema, and add suggestion
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13887
>                 URL: https://issues.apache.org/jira/browse/ARROW-13887
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Nicola Crane
>            Priority: Major
>              Labels: good-first-issue
>
> When reading in a CSV with headers, and also using a schema, we get an error as the code tries to read in the header as a line of data.
> {code:java}
> share_data <- tibble::tibble(
>   company = c("AMZN", "GOOG", "BKNG", "TSLA"),
>   price = c(3463.12, 2884.38, 2300.46, 732.39)
> )
> readr::write_csv(share_data, file = "share_data.csv")
> share_schema <- schema(
>   company = utf8(),
>   price = float64()
> )
> read_csv_arrow("share_data.csv", schema = share_schema)
> {code}
> {code:java}
> Error: Invalid: In CSV column #1: CSV conversion error to double: invalid value 'price'
> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:492 decoder_.Decode(data, size, quoted, &value)
> /home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:496 parser.VisitColumn(col_index, visit) {code}
> The correct thing here would have been for the user to supply the argument {{skip=1}} to {{read_csv_arrow()}} but this is not immediately obvious from the error message returned from C++.  We should capture the error and instead supply our own error message using {{rlang::abort}} which informs the user of the error and then suggests what they can do to prevent it.
>  
> For similar examples (and their associated PRs) see {color:#1d1c1d}ARROW-11766, and ARROW-12791{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)