You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/20 17:41:02 UTC

[GitHub] [arrow] lidavidm commented on issue #12469: [R] int32/int64 issues in opening CSVs

lidavidm commented on issue #12469:
URL: https://github.com/apache/arrow/issues/12469#issuecomment-1046287019


   @dhicks if you happen to have a reprex for the crash, it would be much appreciated - compute() shouldn't crash on local files. (Is it for the same file you've attached?)
   
   Also, what version of Arrow are you using? (`arrow_info()` would give you this.)
   
   You can use `col_types` or `skip_rows`. The documentation is a little unclear on this point, but the acceptable option names actually come from CsvReadOptions/CsvParseOptions/CsvConvertOptions. For example:
   
   ```r
   > open_dataset('./temp/1960-1-01.csv', format='csv', schema=schema(article_id=string(), phrase=string(), n=int32())) %>% collect()
   Error: Invalid: Could not open CSV input source '/home/lidavidm/temp/1960-1-01.csv': Invalid: In CSV column #2: Row #1: CSV conversion error to int32: invalid value 'n'
   > open_dataset('./temp/1960-1-01.csv', format='csv', schema=schema(article_id=string(), phrase=string(), n=int32()), skip_rows=1) %>% collect()
   # A tibble: 452 × 3
      article_id phrase          n
      <chr>      <chr>       <int>
    1 1960-1-01  it             63
    2 1960-1-01  we             24
    3 1960-1-01  world          13
    4 1960-1-01  numbers        11
    5 1960-1-01  they           11
    6 1960-1-01  our_numbers    10
    7 1960-1-01  he              9
    8 1960-1-01  life            9
    9 1960-1-01  i               8
   10 1960-1-01  mankind         7
   # … with 442 more rows
   > open_dataset('./temp/1960-1-01.csv', format='csv', col_types=schema(article_id=string(), phrase=string(), n=int32())) %>% collect()
   # A tibble: 452 × 3
      article_id phrase          n
      <chr>      <chr>       <int>
    1 1960-1-01  it             63
    2 1960-1-01  we             24
    3 1960-1-01  world          13
    4 1960-1-01  numbers        11
    5 1960-1-01  they           11
    6 1960-1-01  our_numbers    10
    7 1960-1-01  he              9
    8 1960-1-01  life            9
    9 1960-1-01  i               8
   10 1960-1-01  mankind         7
   # … with 442 more rows
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org