You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Mamta A. Satoor (JIRA)" <ji...@apache.org> on 2009/03/09 18:45:50 UTC
[jira] Updated: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

     [ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3788:
-----------------------------------

    Attachment: DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt
                DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt

I have an intermediate patch ready for review. The goal of the patch is to detect early on in the compile phase of a SELECT query if there are tables involved with missing statistics. If yes, then abandon the SELECT query compilation, create the missing statistics and then try to compile the original SELECT query again. This logic gets driven inside the GenericStatement.prepMinion code. GenericStatement.prepMinion is responsible for compile phase of a SQL. If the SQL involved is a SELECT statement, then we may go through the compile process of the SELECT query twice. If the  SQL involved is non-SELECT, we will finish SQL compilation in the first pass. The details about the code flow for SELECT statement compilation is as follows.

For a SELECT statement, we may have to go through the query compilation phase twice. Whether we go through one cycle or 2 cycles of compilation phase depends on wheter there are any tables involved in the query who have their statistics missing. During the first attempt of query compile, we will make a list of all the tables involved in the query during the bind phase of compile. After the bind phase, during the first pass through the SELECT query compilation, we will check if the query involved has any tables whose statistics are missing. If yes, then we will quit from the SELECT compile phase by throwing missing statistics exception. We will handle the missing stats exception by trying to update the missing statistics. After that, we will go through the second pass of original query compilation. During this second pass, we will not worry if the statistics are missing or not. We will just work with whatever statistics are available (one example where the statistics may still be missing after trying to create them is say a user who only has select privileges on a table finds the missing stats. But the statistics creation will fail because the user does not have sufficient privileges to create the statistics. There can be other reasons for statistics creation failure too. Another instance would be that we can't get the necessary locks to update the stats. In such a case, we will just quit out of updating the stats and move on to the second pass of the original query compilation. 

One of the issues with this patch I need to work on and will appreicate if anyone has any feedback on. The update statistics is happening in the user transaction. What that means is that any locks required to update statistics will stay in place until the user transaction has been committed/rolled back. This behavior can be an issue with user applications (since these are the locks the user was not expecting to get as part of a SELECT query execution). This behavior definitely causes problems with our tests where quite a handful of tests run into locking issues because of the additional locks acquired before of a SELECT query. I think one way to fix this would be to somehow run the statistics in a nested transaction which can be committed after the statistics are created. If we run into locking issues in the nested transaction then go ahead and run the statistic in the user transaction. This is what we do for GENERATED columns in InsertResultSet.getSetAutoincrementValue. One of the problems that I need to address before I can use nested transaction is to change the code that is written to fire statistics during SELECT compilation. Currently, in my patch, I am executing the stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS which internally executes ALTER TABLE.. to update the statistics. I am at too high a level to be able to use nested transaction for updating the statistics (let me know if I am wrong here. May be there is a way to use nested transaction while firing the SYSCS_UTIL.SYSCS_UPDATE_STATISTICS that I am unaware of). I think I need to get to update statistics code in AlterTableConstantAction directly somehow without going through the stored procedure->ALTER TABLE.. sql path. I think if I can directly call the update statistics in AlterTableConstantAction, then I can try using the nested yransaction and if that does not succeed then user the user transaction.

I will appreciate any input you might have on my overall approach to this jira entry and then how to address the locking issue.

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt, DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good for Derby to provide an automatic way of updating the statistics without requiring to run the stored procedure manually. There was some discussion on DERBY-269 about providing the 0-admin way. I have copied it here for reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable to the manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when exact statistics were calculated. This number is 0 if none have been calculated because index creation happened on an empty table. At query compile time when we look up statistics we automatically recalculate the statistics at certain threshholds - say something like row count growing past next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere around how many rows we can process in some small amount of time - like 1 second on a modern laptop. If we are worried about response time, maybe we background queue the stat gathering rather than waiting with maybe some quick load if no stat has ever been gathered. The background gathering could be optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy to support customers and debug issues in the field. There is proably always some dynamic data distribution change that in some case won't be picked up by the automatic algorithm. Also just very useful for those who have complete control of the create ddl, load data, run stats, deliver application process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.