You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rémy SAISSY (JIRA)" <ji...@apache.org> on 2014/05/27 00:11:01 UTC
[jira] [Created] (HIVE-7125) Support strings in the DELIMITED BY
statement
Rémy SAISSY created HIVE-7125:
---------------------------------
Summary: Support strings in the DELIMITED BY statement
Key: HIVE-7125
URL: https://issues.apache.org/jira/browse/HIVE-7125
Project: Hive
Issue Type: Improvement
Components: Query Processor
Affects Versions: 0.13.0
Reporter: Rémy SAISSY
Hi,
I came to work with a dataset which look like that:
dataset.txt:
salut|;les|;|amiches
comment|;|allez|;|vous
This dataset's delimiter is not a specific character like | or ; but a string, |;| in this case.
Therefore I have created an external table with this delimiter:
hive> create external table ds (f1 string, f2 string, f3 string)
row format delimited fields terminated by '|;|'
location '/user/remy/dataset';
But I got this error:
MismatchedTokenException(5!=301)
at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormatFieldIdentifier(HiveParser.java:31433)
at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatDelimited(HiveParser.java:30386)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:30662)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4683)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2144)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:373)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:291)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:944)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:102 mismatched input '|' expecting StringLiteral near 'by' in table row format's field separator
The workaround was to run a mapreduce job to preprocess the data and replace the delimiter by a single and unused character (my client uses a three characters delimiter in order to ensure that the sequence won't appear elsewhere in the csv).
However, it would be nice to be able to directly integrate it into an external table.
--
This message was sent by Atlassian JIRA
(v6.2#6252)