You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "andygrove (via GitHub)" <gi...@apache.org> on 2023/02/06 18:11:43 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #5204: Add ability to process CSV files containing invalid UTF-8 characters

andygrove opened a new issue, #5204:
URL: https://github.com/apache/arrow-datafusion/issues/5204

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I have a dataset with invalid UTF-8 characters. Spark is happy to query it, but DataFusion fails with an error. My workaround is to preprocess the file using `String::from_utf8_lossy` but it would be great if I could just set an option in `CsvReadOptions` to have DataFusion do this for me.
   
   **Describe the solution you'd like**
   As described.
   
   **Describe alternatives you've considered**
   Just preprocess my inputs.
   
   **Additional context**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org