You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Bin Shi (JIRA)" <ji...@apache.org> on 2018/09/13 22:59:00 UTC

[jira] [Commented] (PHOENIX-4008) UPDATE STATISTIC should run raw scan with all versions of cells

    [ https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614146#comment-16614146 ] 

Bin Shi commented on PHOENIX-4008:
----------------------------------

[~tdsilva]

If my understanding on what you said is correct, currently when we collect stats, it includes the deleted rows, which makes table sampling relying on stats to be inaccurate. The changes you suggested to make are: 1. By default, we don't count deleted rows in stats collected; 2. Make "include the deleted rows in the stats" to be optional, and use "UPDATE STATISTICS (INCLUDE DELETED ROWS) " explicitly to indicateĀ that the stats should include the deleted rows. Shall we use a global configuration to indicate whether or not the stats should include the deleted rows when collecting stats in major compaction or running jobs to collect statsĀ from the Snapshot?

> UPDATE STATISTIC should run raw scan with all versions of cells
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-4008
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4008
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Bin Shi
>            Priority: Major
>
> In order to truly measure the size of data when calculating guide posts, UPDATE STATISTIC should run a raw scan to taken into account all versions of cells. We should also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)