You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Carl Steinbach (JIRA)" <ji...@apache.org> on 2011/08/23 00:18:29 UTC

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2303:
---------------------------------

    Status: Open  (was: Patch Available)

I see diffs in the following tests:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe
org.apache.hadoop.hive.ql.parse.TestParse.testParse_cast1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_part1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join8
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_case
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_when

@Amareshwari: Can you please take a look? Thanks.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira