You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Varun Raval (Jira)" <ji...@apache.org> on 2021/11/19 16:35:00 UTC

[jira] [Commented] (ORC-1031) No way to escape delimiter in column values

    [ https://issues.apache.org/jira/browse/ORC-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446567#comment-17446567 ] 

Varun Raval commented on ORC-1031:
----------------------------------

No, I don't believe Hive supports tables backed by those binary columns in CSV files directly. But you can store binary data in hexadecimal format in a CSV file, store that CSV file in a CSV table with string column for hexadecimal data, and then run insert command in Hive as below to insert data from CSV table to ORC table which will convert CSV file to ORC file resulting in orc file having the column with binary data.
{code:java}
insert into destination select unhex(column1) from source;{code}

> No way to escape delimiter in column values
> -------------------------------------------
>
>                 Key: ORC-1031
>                 URL: https://issues.apache.org/jira/browse/ORC-1031
>             Project: ORC
>          Issue Type: Bug
>          Components: C++
>            Reporter: Varun Raval
>            Priority: Major
>
> I am using the C++ csv to orc tool to convert csv file to orc file and I could not find a way to escape the delimiters present in the column values of the table in csv file. If a delimiter is present as part of a column value in csv file, csv to orc tool uses that character to separate the columns and that messes up the data in the orc file.
>  
> For my scenario, all the possible values for delimiter can be a character in one of the columns in csv file.
> To provide more information about my use case, I have a hive table with binary column and I have a csv file with that column having binary data. I am converting csv file to orc file using this tool. There are no limitations on what kind of data that binary column can have and hence the delimiter we use for csv to orc conversion, can end up inside that binary column.
> Sample value of the binary column shown below
> {code:java}
> 9Tl���������������~sjc_\[[\^`a`]WPF:."�������������������+Gaw���������������xnf`][Z[\_`a_[TK@4
> {code}
>  
> If there is a way to escape the delimiter characters in the column values, that would be really useful!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)