You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by java8964 java8964 <ja...@hotmail.com> on 2012/04/03 19:47:30 UTC
Question about org.apache.hadoop.hive.contrib.serde2.RegexSerDe
Hi,
I have a question about the behavior of the class org.apache.hadoop.hive.contrib.serde2.RegexSerDe. Here is the example I tested using the Cloudra hive-0.7.1-cdh3u3 release. The above class did NOT do what I expect, any one knows the reason?
user:~/tmp> more Test.javaimport java.io.*;import java.text.*;
class Test { public static void main (String[] argv) throws Exception { String line = "aaa,\"bbb\",\"cc,c\""; String[] tokens = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"); int i = 1; for(String t : tokens) { System.out.println(i + "> "+t); i++; } }}
:~/tmp> java Test1> aaa2> "bbb"3> "cc,c"
As you can see, the Java regular expression ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" did what I want it to do, it parse the string aaa,"bbb","cc,c" to 3 tokens: (aaa), ("bbb"), and ("cc,c"). So the regular expression works fine.
Now in the hive:
:~> more test.txtaaa,"bbb","cc,c":~> hiveHive history file=/tmp/user/hive_job_log_user_201204031242_591028210.txthive> create table test( > c1 string, > c2 string, > c3 string > ) > row format > SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > WITH SERDEPROPERTIES ( > "input.regex" = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" > ) > STORED AS TEXTFILE;OKTime taken: 0.401 secondshive> load data local inpath 'test.txt' overwrite into table test;Copying data from file:/home/user/test.txtCopying file: file:/home/user/test.txtLoading data to table dev.testDeleted hdfs://host/user/hive/warehouse/dev.db/testOKTime taken: 0.282 secondshive> select * from test; OKNULL NULL NULL
When I query this table, I don't get what I expected. I expect the output should be the 3 strings like this -----> aaa "bbb" "cc,c"
Why the output gives me 3 NULLs?
Thanks for your help.