You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2011/07/25 15:05:09 UTC
[jira] [Created] (HIVE-2303) files with control-A,B are not
delimited correctly.
files with control-A,B are not delimited correctly.
---------------------------------------------------
Key: HIVE-2303
URL: https://issues.apache.org/jira/browse/HIVE-2303
Project: Hive
Issue Type: Bug
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
The following is from one of our users:
create external table impressions (imp string, msg string)
row format delimited
fields terminated by '\t'
lines terminated by '\n'
stored as textfile
location '/xxx';
Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
Select * from impressions limit 10;
All fields were able to print correctly. However if I do a
Select * from impressions where msg regexp '.*' limit 10;
The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Status: Patch Available (was: Open)
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Status: Open (was: Patch Available)
Missed data file in the patch.
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103426#comment-13103426 ]
Hudson commented on HIVE-2303:
------------------------------
Integrated in Hive-trunk-h0.21 #949 (See [https://builds.apache.org/job/Hive-trunk-h0.21/949/])
HIVE-2303. Files with control-A,B are not delimited correctly (Amareshwari Sriramadasu via cws)
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1170005
Files :
* /hive/trunk/data/files/in7.txt
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
* /hive/trunk/ql/src/test/queries/clientpositive/delimiter.q
* /hive/trunk/ql/src/test/results/clientpositive/combine2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/delimiter.q.out
* /hive/trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input23.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input42.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/louter_join_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/outer_join_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/pcr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/rand_partitionpruner1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/rand_partitionpruner3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/regexp_extract.q.out
* /hive/trunk/ql/src/test/results/clientpositive/router_join_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/transform_ppr1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/transform_ppr2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/udf_explode.q.out
* /hive/trunk/ql/src/test/results/clientpositive/udf_reflect.q.out
* /hive/trunk/ql/src/test/results/clientpositive/udtf_explode.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union_ppr.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/cast1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input20.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input8.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input_part1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input_testxpath.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/input_testxpath2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join7.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join8.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/sample1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf6.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf_case.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf_when.q.xml
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.9.0
>
> Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303-4.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-2303:
---------------------------------
Status: Open (was: Patch Available)
I see diffs in the following tests:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe
org.apache.hadoop.hive.ql.parse.TestParse.testParse_cast1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_part1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join8
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_case
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_when
@Amareshwari: Can you please take a look? Thanks.
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-2303:
---------------------------------
Resolution: Fixed
Fix Version/s: (was: 0.8.0)
0.9.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed to trunk. Thanks Amareshwari!
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.9.0
>
> Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303-4.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Status: Patch Available (was: Open)
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Fix Version/s: 0.8.0
Status: Patch Available (was: Open)
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072719#comment-13072719 ]
Amareshwari Sriramadasu commented on HIVE-2303:
-----------------------------------------------
This problem occurs because FileSinkOperator generates a TableDesc with default properties for storing the output. Solution is to escape the delimiters for the output table.
Shouldn't escaping of delimiters happen always in LazySimpleSerde?
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Status: Patch Available (was: Open)
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-1850-2.txt, patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Attachment: patch-2303-4.txt
Added the missing file
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303-4.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Attachment: (was: patch-1850-2.txt)
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Attachment: patch-2303.txt
Patch adds escape property to the default output table.
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Status: Open (was: Patch Available)
Looks like the patch has gone stale. Will upload rebased patch soon.
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Attachment: patch-1850-2.txt
Sorry. Forgot to update the patch from review board. This patch fixes protectmode failure.
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-1850-2.txt, patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Attachment: patch-2303-3.txt
Patch rebased to trunk.
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Attachment: patch-2303-2.txt
Patch on review board has test outputs regenerated. Uploading the patch from review board.
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079767#comment-13079767 ]
Jakob Homan commented on HIVE-2303:
-----------------------------------
+1 on patch. Always escaping seems reasonable.
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072785#comment-13072785 ]
jiraposter@reviews.apache.org commented on HIVE-2303:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1219/
-----------------------------------------------------------
Review request for hive.
Summary
-------
files with control-A,B are not delimited correctly.
This addresses bug HIVE-2303.
https://issues.apache.org/jira/browse/HIVE-2303
Diffs
-----
trunk/data/files/in7.txt PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 1151047
trunk/ql/src/test/queries/clientpositive/delimiter.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/combine2.q.out 1151047
trunk/ql/src/test/results/clientpositive/delimiter.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out 1151047
trunk/ql/src/test/results/clientpositive/input23.q.out 1151047
trunk/ql/src/test/results/clientpositive/input42.q.out 1151047
trunk/ql/src/test/results/clientpositive/input_part7.q.out 1151047
trunk/ql/src/test/results/clientpositive/input_part9.q.out 1151047
trunk/ql/src/test/results/clientpositive/louter_join_ppr.q.out 1151047
trunk/ql/src/test/results/clientpositive/outer_join_ppr.q.out 1151047
trunk/ql/src/test/results/clientpositive/pcr.q.out 1151047
trunk/ql/src/test/results/clientpositive/rand_partitionpruner1.q.out 1151047
trunk/ql/src/test/results/clientpositive/rand_partitionpruner3.q.out 1151047
trunk/ql/src/test/results/clientpositive/regexp_extract.q.out 1151047
trunk/ql/src/test/results/clientpositive/router_join_ppr.q.out 1151047
trunk/ql/src/test/results/clientpositive/sample10.q.out 1151047
trunk/ql/src/test/results/clientpositive/sample6.q.out 1151047
trunk/ql/src/test/results/clientpositive/sample8.q.out 1151047
trunk/ql/src/test/results/clientpositive/sample9.q.out 1151047
trunk/ql/src/test/results/clientpositive/transform_ppr1.q.out 1151047
trunk/ql/src/test/results/clientpositive/transform_ppr2.q.out 1151047
trunk/ql/src/test/results/clientpositive/udf_explode.q.out 1151047
trunk/ql/src/test/results/clientpositive/udf_reflect.q.out 1151047
trunk/ql/src/test/results/clientpositive/udtf_explode.q.out 1151047
trunk/ql/src/test/results/clientpositive/union_ppr.q.out 1151047
trunk/ql/src/test/results/compiler/plan/cast1.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/groupby4.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/groupby6.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/input20.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/input8.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/input_part1.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/input_testxpath.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/input_testxpath2.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/join4.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/join5.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/join6.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/join7.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/join8.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/sample1.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/udf1.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/udf4.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/udf6.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/udf_case.q.xml 1151047
trunk/ql/src/test/results/compiler/plan/udf_when.q.xml 1151047
Diff: https://reviews.apache.org/r/1219/diff
Testing
-------
All tests passed with patch
Thanks,
Amareshwari
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Attachments: patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Status: Patch Available (was: Open)
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303-3.txt, patch-2303-4.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2303) files with control-A,B are not
delimited correctly.
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2303:
------------------------------------------
Status: Open (was: Patch Available)
Sorry. Wrong jira :(
> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
> Key: HIVE-2303
> URL: https://issues.apache.org/jira/browse/HIVE-2303
> Project: Hive
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.0
>
> Attachments: patch-2303-2.txt, patch-2303.txt
>
>
> The following is from one of our users:
>
> create external table impressions (imp string, msg string)
> row format delimited
> fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile
> location '/xxx';
>
> Some strings in my data contains Control-A, Control-B etc as internal delimiters. If I do a
>
> Select * from impressions limit 10;
>
> All fields were able to print correctly. However if I do a
>
> Select * from impressions where msg regexp '.*' limit 10;
>
> The fields were broken by the control characters. The difference between the 2 commands is that the latter requires a map-reduce job.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira