You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Mamta A. Satoor (JIRA)" <ji...@apache.org> on 2008/12/04 18:54:44 UTC
[jira] Updated: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

     [ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3788:
-----------------------------------

    Attachment: DERBY3788_patch3_stat.txt
                DERBY3788_patch3_diff.txt

I am attaching a new patch which incorporates following comments from Knut.
2)Are the calls to EmbedConnection30.setupContextStack() and restoreContextStack() needed around the call to execute()? I thought execute() would call setup/restoreContextStack() itself. 
I removed the calls to setup and restore the stack since as Knut pointed out, they are unnecessary.
3)Creating an EmbedConnection30 object directly breaks the modularity. Unless the calls to the internal methods are necessary, it may be better to use InternalDriver.activeDriver().connect(url, info) instead. 
I removed the direct creation of EmbedConnection30 and instead use InternalDriver.activeDriver().connect(url, info) 
5)There's a comment in DDImpl5.updateStatisticsInBackGround() saying that "cm is null the very first time, and whenever we aren't actually nested." I'm not sure I understand that comment. Why is it null the first time? And isn't the method always called in a nested context? And if it is null, wouldn't that cause a NullPointerException in EmbedConnection's constructor when url=null is passed in? 
In the case that ContextManager may be null, I do not fire statistics creation in background because that will cause NPE later on because url is null.
6) DDImpl5.updateStatisticsInBackGround() updates the shared variable executorForUpdateStatistics if it is null. But it is not protected by synchronization, so race conditions are possible. 
I now have synchronization code around the code that can result in race conditions.
7) DDImpl5.stop() should call super.stop(). 
Did this.
8) In BackgroundUpdateStatisticsTask, using a prepared statement with the table name and the index name parametrized would be better because it would handle quoting special characters correctly (not handled in 
the current patch) and it would reduce the number of entries in the statement cache. 
Now using PreparedStatement rather than Statement.

The 2 comments from Knut that are not addressed in this patch are as follows
4) I think that the creation of a new connection will reboot the database if it has been shut down in the user thread, which may lead to unpredictable behaviour. It also seems like it will preserve all connection attributes, like attributes to reencrypt the database or to start replication master. 
I think there might be an issue as raised by Knut's comment 4). I still have to trace down completely why some of the junit test failures with my patch say Database "Failed to start(/create) database". It might be because of the properties that are used to create the new Connection object to run the statistics in background. I am not sure yet if it happens during every run of the test jdbcapi.DriverMgrAuthenticationTest but I have seen that test to be one of the tests that fails with Failed to create database. I will work on trying to figure out at what point does the test fail with these errors. If anyone has any ideas on the failures or how we should create the new Connection object,
I will highly appreciate that.
9) As to the locking issues, I would have tried to call Connection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED) in the background thread to see if that solved/reduced the issues. 
I am not 100% about this but I this using isolation Connection.TRANSACTION_READ_UNCOMMITTED had slowed down test runs on my machine. I would like to address number 4 first so that at least the test failures go down and then I can focus on usage of isolation level 
Connection.TRANSACTION_READ_UNCOMMITTED. There ARE few tests that fail because of lock time out and may be using the isolation level recommended by Knut will help.

So, at this point, I will focus on 4) to try to narrow down where are the failures coming from. In the mean time, if community has time to run the junit tests to see if they notice noticeable performance problems with my patch, I will appreciate that feedback. Thanks

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good for Derby to provide an automatic way of updating the statistics without requiring to run the stored procedure manually. There was some discussion on DERBY-269 about providing the 0-admin way. I have copied it here for reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable to the manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when exact statistics were calculated. This number is 0 if none have been calculated because index creation happened on an empty table. At query compile time when we look up statistics we automatically recalculate the statistics at certain threshholds - say something like row count growing past next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere around how many rows we can process in some small amount of time - like 1 second on a modern laptop. If we are worried about response time, maybe we background queue the stat gathering rather than waiting with maybe some quick load if no stat has ever been gathered. The background gathering could be optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy to support customers and debug issues in the field. There is proably always some dynamic data distribution change that in some case won't be picked up by the automatic algorithm. Also just very useful for those who have complete control of the create ddl, load data, run stats, deliver application process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.