You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/10/21 20:29:00 UTC

[jira] [Commented] (ORC-1031) No way to escape delimiter in column values

    [ https://issues.apache.org/jira/browse/ORC-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432701#comment-17432701 ] 

Dongjoon Hyun commented on ORC-1031:
------------------------------------

Thank you for filing a JIRA, [~vraval48]. Do you mean Apache Hive/Spark support a table backed by those binary column CSV files?

> No way to escape delimiter in column values
> -------------------------------------------
>
>                 Key: ORC-1031
>                 URL: https://issues.apache.org/jira/browse/ORC-1031
>             Project: ORC
>          Issue Type: Bug
>          Components: C++
>            Reporter: Varun Raval
>            Priority: Major
>
> I am using the C++ csv to orc tool to convert csv file to orc file and I could not find a way to escape the delimiters present in the column values of the table in csv file. If a delimiter is present as part of a column value in csv file, csv to orc tool uses that character to separate the columns and that messes up the data in the orc file.
>  
> For my scenario, all the possible values for delimiter can be a character in one of the columns in csv file.
> To provide more information about my use case, I have a hive table with binary column and I have a csv file with that column having binary data. I am converting csv file to orc file using this tool. There are no limitations on what kind of data that binary column can have and hence the delimiter we use for csv to orc conversion, can end up inside that binary column.
> Sample value of the binary column shown below
> {code:java}
> 9Tl���������������~sjc_\[[\^`a`]WPF:."�������������������+Gaw���������������xnf`][Z[\_`a_[TK@4
> {code}
>  
> If there is a way to escape the delimiter characters in the column values, that would be really useful!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)