You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sanjay Subramanian <sa...@yahoo.com> on 2015/08/24 21:15:31 UTC

Using SPLIT with DOT(.) delimiter demonstrate funny behavior within a VIEW

Hi guys
I am using Hive version = 0.13.1-cdh5.3.3

HIVE TABLE = qnap_resume_file_location---------------------------------------DROP  TABLE IF EXISTS
     qnap_resume_file_location;CREATE EXTERNAL TABLE qnap_resume_file_location ( floc STRING     ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  LOCATION '/data/myfirm/file_location';
SAMPLE DATA = qnap_resume_file_location---------------------------------"select * from resume.qnap_resume_file_location limit 10"/mnt/myfirm/200901/14008_1_1004942032_0_1005347483.html
/mnt/myfirm/200901/14008_1_1004944724_0_1005334758.html/mnt/myfirm/200901/14000__900719077_10_901309048.doc/mnt/myfirm/200901/14012_6_1004858088_0_1005227990.doc/mnt/myfirm/200901/14008_1_1004811416_0_1005258541.html/mnt/myfirm/200901/14008_1_501597002_6_1005224039.html/mnt/myfirm/200901/14009_5_1004872908_0_1005244348.doc/mnt/myfirm/200901/14001_7_14165_1_66775.docx/mnt/myfirm/200901/14009_5_1004935267_0_1005322091.txt/mnt/myfirm/200901/14012_6_1004904422_0_1005283729.doc

HIVE VIEW = qnap_resume_resid_file_loc_map_vw---------------------------------------------DROP VIEW IF EXISTS
     qnap_resume_resid_file_loc_map_vw;CREATE VIEW IF NOT EXISTS qnap_resume_resid_file_loc_map_vw ( resid , yyyymm , floc      )ASSELECT    split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid,     reverse(split(reverse(floc),'/')[1])  as yyyymm,    flocFROM   resume.qnap_resume_file_location;

QUERY ON VIEW"select * from qnap_resume_resid_file_loc_map_vw limit 10"EXPECTED RESULTS
14012_6_1006686583_0_1102955123 201205 /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx14009_5_1008440384_0_1102887209 201205 /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc14013_7_700200576_8_1102752594 201205 /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc14015_8_1008470815_0_1102954525 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc14009_5_1008373102_0_1102709973 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc14011_1_1007981566_0_1102730546 201205 /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt14015_8_1008436709_0_1102867682 201205 /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx14001_7_1006347358_0_1007662042 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx14009_5_1008382590_0_1102732450 201205 /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc14000_6_1008189721_0_1102253740 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx
ACTUAL RESULTS (see the first column the DOT and extension are still there) 14012_6_1006686583_0_1102955123 .docx 201205 /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx14009_5_1008440384_0_1102887209 .doc 201205 /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc14013_7_700200576_8_1102752594 .doc 201205 /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc14015_8_1008470815_0_1102954525.doc 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc14009_5_1008373102_0_1102709973 .doc 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc14011_1_1007981566_0_1102730546 .txt 201205 /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt14015_8_1008436709_0_1102867682 .docx 201205 /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx14001_7_1006347358_0_1007662042 .docx 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx14009_5_1008382590_0_1102732450 .doc 201205 /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc14000_6_1008189721_0_1102253740 .docx 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx
EXECUTING THE SELECT HIVE QUERY INSIDE THE VIEW INDEPENDENTLY (gives u correct results - but when placed inside a view then it does not seem to work)SELECT    split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid,     reverse(split(reverse(floc),'/')[1])  as yyyymm,    flocFROM   resume.qnap_resume_file_location
14012_6_1006686583_0_1102955123 201205 /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx14009_5_1008440384_0_1102887209 201205 /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc14013_7_700200576_8_1102752594 201205 /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc14015_8_1008470815_0_1102954525 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc14009_5_1008373102_0_1102709973 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc14011_1_1007981566_0_1102730546 201205 /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt14015_8_1008436709_0_1102867682 201205 /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx14001_7_1006347358_0_1007662042 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx14009_5_1008382590_0_1102732450 201205 /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc14000_6_1008189721_0_1102253740 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx

Thoughts ? Ideas ?
thanks
sanjay

Re: Using SPLIT with DOT(.) delimiter demonstrate funny behavior within a VIEW

Posted by Sanjay Subramanian <sa...@yahoo.com>.
Hi Vivek
Thanks A few more experiments 
The \\ , \\\ and \\\\ all work correctly if used as following 
split(reverse(split(reverse(floc),'/')[0]),'[\\.]')[0]   (need those square brackets)
split(reverse(split(reverse(floc),'/')[0]),'\\\.')[0]  (does not need square brackets)
split(reverse(split(reverse(floc),'/')[0]),'[\\\\.]')[0]   (need those square brackets)
        From: Vivek Veeramani <vi...@gmail.com>
 To: user@hive.apache.org; Sanjay Subramanian <sa...@yahoo.com> 
 Sent: Monday, August 24, 2015 1:57 PM
 Subject: Re: Using SPLIT with DOT(.) delimiter demonstrate funny behavior within a VIEW
   
Hi Sanjay,
Try replacing the 4 backslashes with just 2. Usually works with 2 backslashes.
Replace this split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid         with split(reverse(split(reverse(floc),'/')[0]),'\\.')[0] as resid

Please have a look and let us know if that helps.


Best ,
Vivek Veeramani

cell : + 1-415 996 7853



On Mon, Aug 24, 2015 at 12:15 PM, Sanjay Subramanian <sa...@yahoo.com> wrote:

Hi guys
I am using Hive version = 0.13.1-cdh5.3.3

HIVE TABLE = qnap_resume_file_location---------------------------------------DROP  TABLE IF EXISTS
     qnap_resume_file_location;CREATE EXTERNAL TABLE qnap_resume_file_location ( floc STRING     ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  LOCATION '/data/myfirm/file_location';
SAMPLE DATA = qnap_resume_file_location---------------------------------"select * from resume.qnap_resume_file_location limit 10"/mnt/myfirm/200901/14008_1_1004942032_0_1005347483.html
/mnt/myfirm/200901/14008_1_1004944724_0_1005334758.html/mnt/myfirm/200901/14000__900719077_10_901309048.doc/mnt/myfirm/200901/14012_6_1004858088_0_1005227990.doc/mnt/myfirm/200901/14008_1_1004811416_0_1005258541.html/mnt/myfirm/200901/14008_1_501597002_6_1005224039.html/mnt/myfirm/200901/14009_5_1004872908_0_1005244348.doc/mnt/myfirm/200901/14001_7_14165_1_66775.docx/mnt/myfirm/200901/14009_5_1004935267_0_1005322091.txt/mnt/myfirm/200901/14012_6_1004904422_0_1005283729.doc

HIVE VIEW = qnap_resume_resid_file_loc_map_vw---------------------------------------------DROP VIEW IF EXISTS
     qnap_resume_resid_file_loc_map_vw;CREATE VIEW IF NOT EXISTS qnap_resume_resid_file_loc_map_vw ( resid , yyyymm , floc      )ASSELECT    split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid,     reverse(split(reverse(floc),'/')[1])  as yyyymm,    flocFROM   resume.qnap_resume_file_location;

QUERY ON VIEW"select * from qnap_resume_resid_file_loc_map_vw limit 10"EXPECTED RESULTS
14012_6_1006686583_0_1102955123 201205 /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx14009_5_1008440384_0_1102887209 201205 /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc14013_7_700200576_8_1102752594 201205 /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc14015_8_1008470815_0_1102954525 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc14009_5_1008373102_0_1102709973 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc14011_1_1007981566_0_1102730546 201205 /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt14015_8_1008436709_0_1102867682 201205 /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx14001_7_1006347358_0_1007662042 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx14009_5_1008382590_0_1102732450 201205 /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc14000_6_1008189721_0_1102253740 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx
ACTUAL RESULTS (see the first column the DOT and extension are still there) 14012_6_1006686583_0_1102955123 .docx 201205 /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx14009_5_1008440384_0_1102887209 .doc 201205 /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc14013_7_700200576_8_1102752594 .doc 201205 /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc14015_8_1008470815_0_1102954525.doc 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc14009_5_1008373102_0_1102709973 .doc 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc14011_1_1007981566_0_1102730546 .txt 201205 /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt14015_8_1008436709_0_1102867682 .docx 201205 /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx14001_7_1006347358_0_1007662042 .docx 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx14009_5_1008382590_0_1102732450 .doc 201205 /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc14000_6_1008189721_0_1102253740 .docx 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx
EXECUTING THE SELECT HIVE QUERY INSIDE THE VIEW INDEPENDENTLY (gives u correct results - but when placed inside a view then it does not seem to work)SELECT    split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid,     reverse(split(reverse(floc),'/')[1])  as yyyymm,    flocFROM   resume.qnap_resume_file_location
14012_6_1006686583_0_1102955123 201205 /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx14009_5_1008440384_0_1102887209 201205 /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc14013_7_700200576_8_1102752594 201205 /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc14015_8_1008470815_0_1102954525 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc14009_5_1008373102_0_1102709973 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc14011_1_1007981566_0_1102730546 201205 /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt14015_8_1008436709_0_1102867682 201205 /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx14001_7_1006347358_0_1007662042 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx14009_5_1008382590_0_1102732450 201205 /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc14000_6_1008189721_0_1102253740 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx

Thoughts ? Ideas ?
thanks
sanjay




  

Re: Using SPLIT with DOT(.) delimiter demonstrate funny behavior within a VIEW

Posted by Vivek Veeramani <vi...@gmail.com>.
Hi Sanjay,

Try replacing the 4 backslashes with just 2. Usually works with 2
backslashes.

Replace this *split(reverse(split(reverse(**floc**),'/')[0]),'\\\\.')[0]*
as resid
        with *split(reverse(split(reverse(floc**),'/')[0]),'\\.')[0]* as
resid

Please have a look and let us know if that helps.



Best ,
Vivek Veeramani

cell : + 1-415 996 7853


On Mon, Aug 24, 2015 at 12:15 PM, Sanjay Subramanian <
sanjaysubramanian@yahoo.com> wrote:

> Hi guys
>
> I am using
>
> Hive version = 0.13.1-cdh5.3.3
>
>
>
> *HIVE TABLE = qnap_resume_file_location*
> *---------------------------------------*
> DROP  TABLE IF EXISTS
>      qnap_resume_file_location
> ;
> CREATE EXTERNAL TABLE qnap_resume_file_location (
> floc STRING
>      ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  LOCATION '/data/
> myfirm/file_location'
> ;
>
> *SAMPLE DATA = qnap_resume_file_location*
> *---------------------------------*
> "select * from resume.qnap_resume_file_location limit 10"
> /mnt/myfirm/200901/14008_1_1004942032_0_1005347483.html
> /mnt/myfirm/200901/14008_1_1004944724_0_1005334758.html
> /mnt/myfirm/200901/14000__900719077_10_901309048.doc
> /mnt/myfirm/200901/14012_6_1004858088_0_1005227990.doc
> /mnt/myfirm/200901/14008_1_1004811416_0_1005258541.html
> /mnt/myfirm/200901/14008_1_501597002_6_1005224039.html
> /mnt/myfirm/200901/14009_5_1004872908_0_1005244348.doc
> /mnt/myfirm/200901/14001_7_14165_1_66775.docx
> /mnt/myfirm/200901/14009_5_1004935267_0_1005322091.txt
> /mnt/myfirm/200901/14012_6_1004904422_0_1005283729.doc
>
>
> *HIVE VIEW = qnap_resume_resid_file_loc_map_vw*
> *---------------------------------------------*
> DROP VIEW IF EXISTS
>      qnap_resume_resid_file_loc_map_vw
> ;
> CREATE VIEW IF NOT EXISTS qnap_resume_resid_file_loc_map_vw (
> resid ,
> yyyymm ,
> floc
>      )
> AS
> SELECT
>     split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid,
>     reverse(split(reverse(floc),'/')[1])  as yyyymm,
>     floc
> FROM
>   resume.qnap_resume_file_location
> ;
>
>
> *QUERY ON VIEW*
> "select * from qnap_resume_resid_file_loc_map_vw limit 10"
> *EXPECTED RESULTS*
> *14012_6_1006686583_0_1102955123* 201205
> /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx
> 14009_5_1008440384_0_1102887209 201205
> /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc
> 14013_7_700200576_8_1102752594 201205
> /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc
> 14015_8_1008470815_0_1102954525 201205
> /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc
> 14009_5_1008373102_0_1102709973 201205
> /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc
> 14011_1_1007981566_0_1102730546 201205
> /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt
> 14015_8_1008436709_0_1102867682 201205
> /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx
> 14001_7_1006347358_0_1007662042 201205
> /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx
> 14009_5_1008382590_0_1102732450 201205
> /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc
> 14000_6_1008189721_0_1102253740 201205
> /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx
>
> *ACTUAL RESULTS* (see the first column the DOT and extension are still
> there)
> *14012_6_1006686583_0_1102955123** .docx* 201205
> /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx
> 14009_5_1008440384_0_1102887209 .doc 201205
> /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc
> 14013_7_700200576_8_1102752594 .doc 201205
> /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc
> 14015_8_1008470815_0_1102954525.doc 201205
> /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc
> 14009_5_1008373102_0_1102709973 .doc 201205
> /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc
> 14011_1_1007981566_0_1102730546 .txt 201205
> /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt
> 14015_8_1008436709_0_1102867682 .docx 201205
> /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx
> 14001_7_1006347358_0_1007662042 .docx 201205
> /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx
> 14009_5_1008382590_0_1102732450 .doc 201205
> /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc
> 14000_6_1008189721_0_1102253740 .docx 201205
> /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx
>
> *EXECUTING THE SELECT HIVE QUERY INSIDE THE VIEW INDEPENDENTLY (gives u
> correct results - but when placed inside a view then it does not seem to
> work)*
> SELECT
>     split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid,
>     reverse(split(reverse(floc),'/')[1])  as yyyymm,
>     floc
> FROM
>   resume.qnap_resume_file_location
>
> 14012_6_1006686583_0_1102955123 201205
> /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx
> 14009_5_1008440384_0_1102887209 201205
> /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc
> 14013_7_700200576_8_1102752594 201205
> /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc
> 14015_8_1008470815_0_1102954525 201205
> /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc
> 14009_5_1008373102_0_1102709973 201205
> /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc
> 14011_1_1007981566_0_1102730546 201205
> /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt
> 14015_8_1008436709_0_1102867682 201205
> /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx
> 14001_7_1006347358_0_1007662042 201205
> /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx
> 14009_5_1008382590_0_1102732450 201205
> /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc
> 14000_6_1008189721_0_1102253740 201205
> /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx
>
>
> Thoughts ? Ideas ?
>
> thanks
>
> sanjay
>
>