You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sachin Bochare <sa...@persistent.co.in> on 2010/05/02 08:29:57 UTC

Select not working with Index patch

Hi,

I applied index patch available at : https://issues.apache.org/jira/browse/HIVE-678

However after applying the indexing patch, simple select statements are not showing any results. The "select *" is working but selecting a specific column is not working. I have pasted an example below which illustrates the problem.

The same select is working without the patch on the same metastore_db. The only difference between working code and non-working code is the patch.

I used 796926 version of the code. The patch attached in HIVE-678 was created on this version.

Following example illustrates the problem:

Example with patch code:
-----------------------------

=====================================
hive> create table ourtest (empid int, firstname string, lastname string, hoursworked int) partitioned by(dt string, place string) clustered by (empid) sorted by(hoursworked) into 4 buckets row format delimited fields terminated by ',' stored as textfile;
OK
Time taken: 0.307 seconds
hive> LOAD DATA LOCAL INPATH '/root/data/ourtest_data.csv' INTO TABLE ourtest PARTITION(dt='2010-02-27', place='Pune');
Copying data from file:/root/data/ourtest_data.csv
Loading data to table ourtest partition {dt=2010-02-27, place=Pune}
OK
Time taken: 0.753 seconds
hive> select * from ourtest; ---> Select * is working fine.
OK
0       firstname       lastname        0       2010-02-27      Pune
1       firstname1      lastname1       1       2010-02-27      Pune
2       firstname2      lastname2       2       2010-02-27      Pune
3       firstname3      lastname3       3       2010-02-27      Pune
4       firstname4      lastname4       4       2010-02-27      Pune
5       firstname5      lastname5       5       2010-02-27      Pune
6       firstname6      lastname6       6       2010-02-27      Pune
7       firstname7      lastname7       7       2010-02-27      Pune
8       firstname8      lastname8       8       2010-02-27      Pune
9       firstname9      lastname9       9       2010-02-27      Pune
10      firstname10     lastname10      10      2010-02-27      Pune
Time taken: 0.106 seconds
hive> select empid from ourtest; ---> Selecting specific column is not working.
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002091652_0170, Tracking URL = http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0170
Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job  -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001 -kill job_201002091652_0170
2010-05-02 08:40:48,951 map = 0%,  reduce =0%
2010-05-02 08:40:58,044 map = 50%,  reduce =0%
2010-05-02 08:40:59,057 map = 100%,  reduce =0%
2010-05-02 08:41:02,067 map = 100%,  reduce =100%
Ended Job = job_201002091652_0170
OK
Time taken: 15.494 seconds
=====================================

Example without patch code:
--------------------------------
Example query is working after using without-patch code on the same metastore_db.

=====================================
root@v-hadoop3<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>:~/<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>sachin/Hive-796926-Patch<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx># ../Hive-796926/build/dist/bin/hive
Hive history file=/tmp/root/hive_job_log_root_201005020928_924651644.txt
hive> select empid from ourtest;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002091652_0190, Tracking URL = http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0190
Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job  -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001 -kill job_201002091652_0190
2010-05-02 09:29:04,733 map = 0%,  reduce =0%
2010-05-02 09:29:18,799 map = 100%,  reduce =0%
2010-05-02 09:29:21,823 map = 100%,  reduce =100%
Ended Job = job_201002091652_0190
OK
0
1
2
3
4
5
6
7
8
9
10
Time taken: 22.268 seconds
=====================================

Can anyone point to what can be the problem here? Which module is a suspect here?

Regards,
Sachin

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Select not working with Index patch

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, May 3, 2010 at 12:28 AM, Sachin Bochare <
sachin_bochare@persistent.co.in> wrote:

>  Thanks Edwards.
>
> I was exploring indexing patch and wanted to know how the index table looks
> like.
>
> Few points which were not mentioned in my earlier posting:
>
> 1. I haven't created any index on the test table. So index logic is not
> involved here. The query is not working on normal table.
> 2. I checked the jobtracker result in Hadoop web interface and found that
> Hadoop has read 277 bytes and written 33 bytes. The counters are
> HDFS_BYTES_READ=277 and HDFS_BYTES_WRITTEN=23. The input file size is 277
> bytes and final result is 23 bytes. So Hadoop had returned the correct
> output. However for some reason Hive hasn't received or returned those
> results.
>
> I guess it would be a minor code change. I want to identify the code and
> fix it temporary in my send box. Could someone please point to the module
> where I should look for this issue?
>
> Regards,
> Sachin
>
>
>   ------------------------------
>
> *From:* Edward Capriolo [mailto:edlinuxguru@gmail.com]
> *Sent:* Sunday, May 02, 2010 8:08 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Re: Select not working with Index patch
>
>
>
>
>
> On Sun, May 2, 2010 at 2:29 AM, Sachin Bochare <
> sachin_bochare@persistent.co.in> wrote:
>
> Hi,
>
>
>
> I applied index patch available at :
> https://issues.apache.org/jira/browse/HIVE-678
>
>
>
> However after applying the indexing patch, simple select statements are not
> showing any results. The "select *" is working but selecting a specific
> column is not working. I have pasted an example below which illustrates the
> problem.
>
>
>
> The same select is working without the patch on the same metastore_db. The
> only difference between working code and non-working code is the patch.
>
>
>
> I used 796926 version of the code. The patch attached in HIVE-678 was
> created on this version.
>
>
>
> Following example illustrates the problem:
>
>
>
> Example with patch code:
>
> -----------------------------
>
>
>
> =====================================
>
> hive> create table ourtest (empid int, firstname string, lastname
> string, hoursworked int) partitioned by(dt string, place string) clustered
> by (empid) sorted by(hoursworked) into 4 buckets row format delimited fields
> terminated by ',' stored as textfile;
> OK
> Time taken: 0.307 seconds
> hive> LOAD DATA LOCAL INPATH '/root/data/ourtest_data.csv' INTO
> TABLE ourtest PARTITION(dt='2010-02-27', place='Pune');
> Copying data from file:/root/data/ourtest_data.csv
> Loading data to table ourtest partition {dt=2010-02-27, place=Pune}
> OK
> Time taken: 0.753 seconds
> hive> select * from ourtest; ---> Select * is working fine.
> OK
> 0       firstname       lastname        0       2010-02-27      Pune
> 1       firstname1      lastname1       1       2010-02-27      Pune
> 2       firstname2      lastname2       2       2010-02-27      Pune
> 3       firstname3      lastname3       3       2010-02-27      Pune
> 4       firstname4      lastname4       4       2010-02-27      Pune
> 5       firstname5      lastname5       5       2010-02-27      Pune
> 6       firstname6      lastname6       6       2010-02-27      Pune
> 7       firstname7      lastname7       7       2010-02-27      Pune
> 8       firstname8      lastname8       8       2010-02-27      Pune
> 9       firstname9      lastname9       9       2010-02-27      Pune
> 10      firstname10     lastname10      10      2010-02-27      Pune
> Time taken: 0.106 seconds
> hive> select empid from ourtest; ---> Selecting specific column is not
> working.
> Total MapReduce jobs = 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201002091652_0170, Tracking URL =
> http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0170
> Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job
> -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001 -kill
> job_201002091652_0170
> 2010-05-02 08:40:48,951 map = 0%,  reduce =0%
> 2010-05-02 08:40:58,044 map = 50%,  reduce =0%
> 2010-05-02 08:40:59,057 map = 100%,  reduce =0%
> 2010-05-02 08:41:02,067 map = 100%,  reduce =100%
> Ended Job = job_201002091652_0170
> OK
> Time taken: 15.494 seconds
> =====================================
>
>
>
> Example without patch code:
>
> --------------------------------
>
> Example query is working after using without-patch code on the same
> metastore_db.
>
>
>
> =====================================
>
> root@v-hadoop3<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>
> :~/ <https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>sachin
> /Hive-796926-Patch<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>#
> ../Hive-796926/build/dist/bin/hive
> Hive history file=/tmp/root/hive_job_log_root_201005020928_924651644.txt
> hive> select empid from ourtest;
> Total MapReduce jobs = 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201002091652_0190, Tracking URL =
> http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0190
> Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job
> -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001 -kill
> job_201002091652_0190
> 2010-05-02 09:29:04,733 map = 0%,  reduce =0%
> 2010-05-02 09:29:18,799 map = 100%,  reduce =0%
> 2010-05-02 09:29:21,823 map = 100%,  reduce =100%
> Ended Job = job_201002091652_0190
> OK
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> Time taken: 22.268 seconds
>
> =====================================
>
>
>
> Can anyone point to what can be the problem here? Which module is a suspect
> here?
>
>
>
> Regards,
>
> Sachin
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.
>
>
> The comments for that issue seem to suggest the patch is not complete yet.
> For reference 'select *' queries simply read that block data from hdfs so
> they do not use map-reduce (and thus probably do not use any indexes either.
>
> Edward
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.
>

First, let me say that code is experiementatal it likely never worked
completely at any point.

Hive trunk is fairly active, many things could change. I believe to find
what you are looking for you should do it in the opposite manner.

It is the patch that is breaking hive. Hive did not break around your patch.
Thus,

I suggest you go though the patch and only apply it incrementatly. First, do
NOT modify anything only ADD new class files. Now run some testing maybe
sure your query works. Run some Hive unit tests.

Then apply the sections of the patch that alter existing code method by
method. Keep testing as you go, at some point your use case will break.

As I said up top, the patch was totally experimental and probably never
worked completely. It needs some large refactoring before it comes close to
considering. So if you want to hack at it have fun.

RE: Select not working with Index patch

Posted by Sachin Bochare <sa...@persistent.co.in>.
Thanks Edwards.

I was exploring indexing patch and wanted to know how the index table looks like.

Few points which were not mentioned in my earlier posting:

1. I haven't created any index on the test table. So index logic is not involved here. The query is not working on normal table.
2. I checked the jobtracker result in Hadoop web interface and found that Hadoop has read 277 bytes and written 33 bytes. The counters are HDFS_BYTES_READ=277 and HDFS_BYTES_WRITTEN=23. The input file size is 277 bytes and final result is 23 bytes. So Hadoop had returned the correct output. However for some reason Hive hasn't received or returned those results.

I guess it would be a minor code change. I want to identify the code and fix it temporary in my send box. Could someone please point to the module where I should look for this issue?

Regards,
Sachin

________________________________
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Sunday, May 02, 2010 8:08 PM
To: hive-user@hadoop.apache.org
Subject: Re: Select not working with Index patch


On Sun, May 2, 2010 at 2:29 AM, Sachin Bochare <sa...@persistent.co.in>> wrote:
Hi,

I applied index patch available at : https://issues.apache.org/jira/browse/HIVE-678

However after applying the indexing patch, simple select statements are not showing any results. The "select *" is working but selecting a specific column is not working. I have pasted an example below which illustrates the problem.

The same select is working without the patch on the same metastore_db. The only difference between working code and non-working code is the patch.

I used 796926 version of the code. The patch attached in HIVE-678 was created on this version.

Following example illustrates the problem:

Example with patch code:
-----------------------------

=====================================
hive> create table ourtest (empid int, firstname string, lastname string, hoursworked int) partitioned by(dt string, place string) clustered by (empid) sorted by(hoursworked) into 4 buckets row format delimited fields terminated by ',' stored as textfile;
OK
Time taken: 0.307 seconds
hive> LOAD DATA LOCAL INPATH '/root/data/ourtest_data.csv' INTO TABLE ourtest PARTITION(dt='2010-02-27', place='Pune');
Copying data from file:/root/data/ourtest_data.csv
Loading data to table ourtest partition {dt=2010-02-27, place=Pune}
OK
Time taken: 0.753 seconds
hive> select * from ourtest; ---> Select * is working fine.
OK
0       firstname       lastname        0       2010-02-27      Pune
1       firstname1      lastname1       1       2010-02-27      Pune
2       firstname2      lastname2       2       2010-02-27      Pune
3       firstname3      lastname3       3       2010-02-27      Pune
4       firstname4      lastname4       4       2010-02-27      Pune
5       firstname5      lastname5       5       2010-02-27      Pune
6       firstname6      lastname6       6       2010-02-27      Pune
7       firstname7      lastname7       7       2010-02-27      Pune
8       firstname8      lastname8       8       2010-02-27      Pune
9       firstname9      lastname9       9       2010-02-27      Pune
10      firstname10     lastname10      10      2010-02-27      Pune
Time taken: 0.106 seconds
hive> select empid from ourtest; ---> Selecting specific column is not working.
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002091652_0170, Tracking URL = http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0170
Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job  -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001<http://v-hadoop3.persistent.co.in:30001> -kill job_201002091652_0170
2010-05-02 08:40:48,951 map = 0%,  reduce =0%
2010-05-02 08:40:58,044 map = 50%,  reduce =0%
2010-05-02 08:40:59,057 map = 100%,  reduce =0%
2010-05-02 08:41:02,067 map = 100%,  reduce =100%
Ended Job = job_201002091652_0170
OK
Time taken: 15.494 seconds
=====================================

Example without patch code:
--------------------------------
Example query is working after using without-patch code on the same metastore_db.

=====================================
root@v-hadoop3<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>:~/<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>sachin/Hive-796926-Patch<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx># ../Hive-796926/build/dist/bin/hive
Hive history file=/tmp/root/hive_job_log_root_201005020928_924651644.txt
hive> select empid from ourtest;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002091652_0190, Tracking URL = http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0190
Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job  -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001<http://v-hadoop3.persistent.co.in:30001> -kill job_201002091652_0190
2010-05-02 09:29:04,733 map = 0%,  reduce =0%
2010-05-02 09:29:18,799 map = 100%,  reduce =0%
2010-05-02 09:29:21,823 map = 100%,  reduce =100%
Ended Job = job_201002091652_0190
OK
0
1
2
3
4
5
6
7
8
9
10
Time taken: 22.268 seconds
=====================================

Can anyone point to what can be the problem here? Which module is a suspect here?

Regards,
Sachin

DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

The comments for that issue seem to suggest the patch is not complete yet. For reference 'select *' queries simply read that block data from hdfs so they do not use map-reduce (and thus probably do not use any indexes either.

Edward

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Select not working with Index patch

Posted by Edward Capriolo <ed...@gmail.com>.
On Sun, May 2, 2010 at 2:29 AM, Sachin Bochare <
sachin_bochare@persistent.co.in> wrote:

>    Hi,
>
> I applied index patch available at :
> https://issues.apache.org/jira/browse/HIVE-678
>
> However after applying the indexing patch, simple select statements are not
> showing any results. The "select *" is working but selecting a specific
> column is not working. I have pasted an example below which illustrates the
> problem.
>
> The same select is working without the patch on the same metastore_db. The
> only difference between working code and non-working code is the patch.
>
> I used 796926 version of the code. The patch attached in HIVE-678 was
> created on this version.
>
> Following example illustrates the problem:
>
> Example with patch code:
> -----------------------------
>
> =====================================
> hive> create table ourtest (empid int, firstname string, lastnamestring, hoursworkedint)
> partitioned by(dt string, place string) clustered by (empid) sorted
> by(hoursworked) into 4 buckets row format delimited fields terminated by
> ',' stored as textfile;
> OK
> Time taken: 0.307 seconds
> hive> LOAD DATA LOCAL INPATH '/root/data/ourtest_data.csv' INTO
> TABLE ourtest PARTITION(dt='2010-02-27', place='Pune');
> Copying data from file:/root/data/ourtest_data.csv
> Loading data to table ourtest partition {dt=2010-02-27, place=Pune}
> OK
> Time taken: 0.753 seconds
> hive> select * from ourtest; ---> Select * is working fine.
> OK
> 0       firstname       lastname        0       2010-02-27      Pune
> 1       firstname1      lastname1       1       2010-02-27      Pune
> 2       firstname2      lastname2       2       2010-02-27      Pune
> 3       firstname3      lastname3       3       2010-02-27      Pune
> 4       firstname4      lastname4       4       2010-02-27      Pune
> 5       firstname5      lastname5       5       2010-02-27      Pune
> 6       firstname6      lastname6       6       2010-02-27      Pune
> 7       firstname7      lastname7       7       2010-02-27      Pune
> 8       firstname8      lastname8       8       2010-02-27      Pune
> 9       firstname9      lastname9       9       2010-02-27      Pune
> 10      firstname10     lastname10      10      2010-02-27      Pune
> Time taken: 0.106 seconds
> hive> select empid from ourtest; ---> Selecting specific column is not
> working.
> Total MapReduce jobs = 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201002091652_0170, Tracking URL =
> http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0170
> Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job
> -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001 -kill
> job_201002091652_0170
> 2010-05-02 08:40:48,951 map = 0%,  reduce =0%
> 2010-05-02 08:40:58,044 map = 50%,  reduce =0%
> 2010-05-02 08:40:59,057 map = 100%,  reduce =0%
> 2010-05-02 08:41:02,067 map = 100%,  reduce =100%
> Ended Job = job_201002091652_0170
> OK
> Time taken: 15.494 seconds
> =====================================
>
>  Example without patch code:
> --------------------------------
> Example query is working after using without-patch code on the same
> metastore_db.
>
> =====================================
> root@v-hadoop3<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>
> :~/ <https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>sachin
> /Hive-796926-Patch<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>#
> ../Hive-796926/build/dist/bin/hive
> Hive history file=/tmp/root/hive_job_log_root_201005020928_924651644.txt
> hive> select empid from ourtest;
> Total MapReduce jobs = 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201002091652_0190, Tracking URL =
> http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0190
> Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job
> -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001 -kill
> job_201002091652_0190
> 2010-05-02 09:29:04,733 map = 0%,  reduce =0%
> 2010-05-02 09:29:18,799 map = 100%,  reduce =0%
> 2010-05-02 09:29:21,823 map = 100%,  reduce =100%
> Ended Job = job_201002091652_0190
> OK
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> Time taken: 22.268 seconds
> =====================================
>
> Can anyone point to what can be the problem here? Which module is a suspect
> here?
>
> Regards,
>  Sachin
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.
>

The comments for that issue seem to suggest the patch is not complete yet.
For reference 'select *' queries simply read that block data from hdfs so
they do not use map-reduce (and thus probably do not use any indexes either.

Edward