You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Balazs Jeszenszky (JIRA)" <ji...@apache.org> on 2018/10/04 19:59:00 UTC

[jira] [Commented] (IMPALA-7659) Collect count of nulls when collecting stats

    [ https://issues.apache.org/jira/browse/IMPALA-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638777#comment-16638777 ] 

Balazs Jeszenszky commented on IMPALA-7659:
-------------------------------------------

Combination of IMPALA-7655 and IMPALA-7497.

> Collect count of nulls when collecting stats
> --------------------------------------------
>
>                 Key: IMPALA-7659
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7659
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Piotr Findeisen
>            Priority: Major
>
> When Impala calculates table stats, NULL count gets overridden with -1. 
> Number of NULLs in a table is a useful information. Even if Impala does not benefit from this information, some other tools do. Thus, not collecting this information may pose a problem for Impala users (potentially forcing them to run COMPUTE STATS elsewhere).
> Now, counting NULLs should be an operation that is cheaper than counting NDVs. However, code comment in {{ComputeStatsStmt.java}} suggests otherwise ([~tarmstrong] suggested this is because of IMPALA-7655).
> My suggestion would be to
> - improve expression used to collect NULL count
> - collect NULL count during COMPUTE STATS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org