You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Bin Shi (JIRA)" <ji...@apache.org> on 2018/10/30 04:27:00 UTC

[jira] [Commented] (PHOENIX-4999) Update statistics should not be allowed on tenant specific connection

    [ https://issues.apache.org/jira/browse/PHOENIX-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668113#comment-16668113 ] 

Bin Shi commented on PHOENIX-4999:
----------------------------------

I might have different opinion. We should allow to update statistics on tenant specific connection. The reasons are as follows:
 # The base table is defined as MULTI_TENANT = true, so the common case is that a user uses its tenant connection to issue queries which should include both CRUD queries and Update Statistics query. It isn't friendly to ask this tenant specific user to switch between table-level connection and tenant connection – I'm even not sure whether or not we always allow this tenant specific user to use table-level connection. 
 #  For a tenant specific user, it is natural to ask user to rely on Update Statistics with tenant specific connection to update stats used by this tenant or to rely on major compaction.
 # Different tenant might have different data update frequency and have different speed to reach certain data drift threshold which triggers "Update Statistics". One tenant exceeds data drift threshold, we should just collect/fresh stats for this particular tenant instead of for doing it for the whole table, otherwise that's wastage in terms of time and resource. With that being said, we should always allow each tenant to update its stats independently. 
 # Yes, allowing "update stats" with tenant specific connection on one tenant may result in partial stats on the other tenants, but it shouldn't be the reason that "Update statistics should not be allowed on tenant specific connection", because allowing "update stats" with tenant specific connection isn't the only reason which will cause partial stats on tenants. "Update Statistic" is an atomic operation on region level but not on tenant level. During running "UPDATE STATISICS" using sql statement or MR jobs, any failure on region level could cause partial stats, so anyway we need to fix partial stats issue generally.
 # Stats store/fetch to/from cache should be in the unit of tenant for the tables with MULTI_TENANT = true.**

> Update statistics should not be allowed on tenant specific connection
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-4999
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4999
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>            Priority: Major
>
> Update statistics sql would can trigger partial stats collection when ran using a tenant specific connection. Originally, update statistics internally runs scans on all the regions of table. TenantId field bounds the scans on startKey and endKey in tenant specific connection, which can cause stats to run only on specific regions and result in partial stats collection. 
> Since the view data and table data reside in the same physical HBase table, it doesn't make sense to allow users to run stats for specific tenants as tenants may span across regions. The issue was first identified in PHOENIX-4333.
> The patch however doesn't fully stop the SQL from running. Multiple approaches can be taken here. 
>  # Unset the tenantId on the connection before update statistics is run and reset it back later. This can be tricky and bad to implement since tenantId is essentially a final field on PhoenixConnection.
>  # As [~tdsilva] pointed out, we can throw an UnsupportedOperationException() whenever user tries to update statistics on tenant specific connection.
> The second option seems straightforward to implement and can prevent accidental usage of this sql.
> [~Bin Shi] [~sukumaddineni] Any thoughts here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)