You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2016/07/18 12:28:20 UTC

[jira] [Created] (HIVE-14265) Partial stats in Join operator may lead to data size estimate of 0

Jesus Camacho Rodriguez created HIVE-14265:
----------------------------------------------

             Summary: Partial stats in Join operator may lead to data size estimate of 0
                 Key: HIVE-14265
                 URL: https://issues.apache.org/jira/browse/HIVE-14265
             Project: Hive
          Issue Type: Bug
          Components: Statistics
            Reporter: Nita Dembla
            Assignee: Jesus Camacho Rodriguez


For some tables, we might not have the column stats available. However, if the table is partitioned, we will have the stats for partition columns.

When we estimate the size of the data produced by a join operator, we end up using only the columns that are available for the calculation e.g. partition columns in this case.

However, even in these cases, we should add the data size for those columns for which we do not have stats (_default size for the column type x estimated number of rows_).

To reproduce, the following example can be used:

{noformat}
create table sample_partitioned (x int) partitioned by (y int);
insert into sample_partitioned partition(y=1) values (1),(2);
create temporary table sample as select * from sample_partitioned;
analyze table sample compute statistics for columns;

explain select sample_partitioned.x from sample_partitioned, sample where sample.y = sample_partitioned.y;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)