You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2011/07/25 15:05:09 UTC

[jira] [Created] (HIVE-2303) files with control-A,B are not delimited correctly.

files with control-A,B are not delimited correctly.
---------------------------------------------------

                 Key: HIVE-2303
                 URL: https://issues.apache.org/jira/browse/HIVE-2303
             Project: Hive
          Issue Type: Bug
            Reporter: Amareshwari Sriramadasu
            Assignee: Amareshwari Sriramadasu


The following is from one of our users:
 
create external table impressions (imp string, msg string)
  row format delimited
    fields terminated by '\t'
    lines terminated by '\n'
  stored as textfile                 
  location '/xxx';
 
Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
 
Select * from impressions limit 10;
 
All fields were able to print correctly.  However if I do a
 
Select * from impressions where msg regexp '.*' limit 10;
 
The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Status: Patch Available  (was: Open)

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Status: Open  (was: Patch Available)

Missed data file in the patch.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103426#comment-13103426 ] 

Hudson commented on HIVE-2303:
------------------------------

Integrated in Hive-trunk-h0.21 #949 (See [https://builds.apache.org/job/Hive-trunk-h0.21/949/])
    HIVE-2303. Files with control-A,B are not delimited correctly (Amareshwari Sriramadasu via cws)

cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1170005
Files : 
* /hive/trunk/data/files/in7.txt
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
* /hive/trunk/ql/src/test/queries/clientpositive/delimiter.q
* /hive/trunk/ql/src/test/results/clientpositive/combine2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/delimiter.q.out
* /hive/trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input23.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input42.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/louter_join_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/outer_join_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/pcr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/rand_partitionpruner1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/rand_partitionpruner3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/regexp_extract.q.out
* /hive/trunk/ql/src/test/results/clientpositive/router_join_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/transform_ppr1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/transform_ppr2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/udf_explode.q.out
* /hive/trunk/ql/src/test/results/clientpositive/udf_reflect.q.out
* /hive/trunk/ql/src/test/results/clientpositive/udtf_explode.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union_ppr.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/cast1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input20.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input8.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input_part1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input_testxpath.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input_testxpath2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join7.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join8.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/sample1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf_case.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf_when.q.xml


> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.9.0
>
>         Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303-4.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2303:
---------------------------------

    Status: Open  (was: Patch Available)

I see diffs in the following tests:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe
org.apache.hadoop.hive.ql.parse.TestParse.testParse_cast1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_part1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join8
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_case
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_when

@Amareshwari: Can you please take a look? Thanks.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2303:
---------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.8.0)
                   0.9.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Amareshwari!

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.9.0
>
>         Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303-4.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Status: Patch Available  (was: Open)

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Fix Version/s: 0.8.0
           Status: Patch Available  (was: Open)

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072719#comment-13072719 ] 

Amareshwari Sriramadasu commented on HIVE-2303:
-----------------------------------------------

This problem occurs because FileSinkOperator generates a TableDesc with default properties for storing the output. Solution is to escape the delimiters for the output table. 

Shouldn't escaping of delimiters happen always in LazySimpleSerde? 

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Status: Patch Available  (was: Open)

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-1850-2.txt, patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Attachment: patch-2303-4.txt

Added the missing file

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303-4.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Attachment:     (was: patch-1850-2.txt)

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Attachment: patch-2303.txt

Patch adds escape property to the default output table.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Status: Open  (was: Patch Available)

Looks like the patch has gone stale. Will upload rebased patch soon.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Attachment: patch-1850-2.txt

Sorry. Forgot to update the patch from review board. This patch fixes protectmode failure.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-1850-2.txt, patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Attachment: patch-2303-3.txt

Patch rebased to trunk.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Attachment: patch-2303-2.txt

Patch on review board has test outputs regenerated. Uploading the patch from review board.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079767#comment-13079767 ] 

Jakob Homan commented on HIVE-2303:
-----------------------------------

+1 on patch.  Always escaping seems reasonable.

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072785#comment-13072785 ] 

jiraposter@reviews.apache.org commented on HIVE-2303:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1219/
-----------------------------------------------------------

Review request for hive.


Summary
-------

files with control-A,B are not delimited correctly.


This addresses bug HIVE-2303.
    https://issues.apache.org/jira/browse/HIVE-2303


Diffs
-----

  trunk/data/files/in7.txt PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 1151047 
  trunk/ql/src/test/queries/clientpositive/delimiter.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/combine2.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/delimiter.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/input23.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/input42.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/input_part7.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/input_part9.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/louter_join_ppr.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/outer_join_ppr.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/pcr.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/rand_partitionpruner1.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/rand_partitionpruner3.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/regexp_extract.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/router_join_ppr.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/sample10.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/sample6.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/sample8.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/sample9.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/transform_ppr1.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/transform_ppr2.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/udf_explode.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/udf_reflect.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/udtf_explode.q.out 1151047 
  trunk/ql/src/test/results/clientpositive/union_ppr.q.out 1151047 
  trunk/ql/src/test/results/compiler/plan/cast1.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/groupby4.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/groupby6.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/input20.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/input8.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/input_part1.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/input_testxpath.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/input_testxpath2.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/join4.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/join5.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/join6.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/join7.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/join8.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/sample1.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/udf1.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/udf4.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/udf6.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/udf_case.q.xml 1151047 
  trunk/ql/src/test/results/compiler/plan/udf_when.q.xml 1151047 

Diff: https://reviews.apache.org/r/1219/diff


Testing
-------

All tests passed with patch


Thanks,

Amareshwari



> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Status: Patch Available  (was: Open)

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303-4.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2303) files with control-A,B are not delimited correctly.

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------

    Status: Open  (was: Patch Available)

Sorry. Wrong jira :(

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira