You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-commits@db.apache.org by Apache Wiki <wi...@apache.org> on 2006/09/02 17:56:29 UTC
[Db-derby Wiki] Update of "DataDictionaryCaching" by BryanPendleton

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by BryanPendleton:
http://wiki.apache.org/db-derby/DataDictionaryCaching

The comment on the change is:
Some high level notes on DD caching

New page:
The DataDictionary implementation contains several major caches of descriptors:
 * {{{nameTdCache}}} caches {{{TableDescriptors}}} and can find one by name
 * {{{OIDTdCache}}} caches {{{TableDescriptors}}} and can find one by UUID
 * {{{spsNameCache}}} caches Stored Prepared Statements and can find one by name
 * {{{permissionsCache}}} caches Permissions and can find one by {{{PermissionDescriptor}}}

Since a {{{TableDescriptor}}} object is the root of a tree of objects describing that table (its columns, its constraints, its triggers, its conglomerates), caching the {{{TableDescriptor}}} also implicitly caches the table's {{{ColumnDescriptors}}}, its {{{ConstraintDescriptors}}}, its {{{TriggerDescriptors}}} and so forth.

Caching is crucial to DataDictionary performance; otherwise we would constantly need to be reading metadata from the SystemTables on disk. Sharing one copy of the DataDictionary information in memory among many users also reduces memory footprint. So the DataDictionary tries very hard to read the SystemTables information into memory as rarely as possible, and tries to hold it in memory as long as possible.

There are two reasons why the DataDictionary can't always do this:
 * The caches are limited in size, and so the cache manager may not be able to keep all the tables in memory
 * The SystemTables information is not static: applications may dynamically change the database schema, by defining new schema objects (tables, views, triggers, constraints, etc.), or by dropping or modifying existing schema objects.

When the database schema is modified, the DataDictionary uses a very simple mechanism: it empties the caches and starts over. It does not make any special efforts to determine which cached information has become invalid, but instead just removes it all.

LanguageSystem code which accesses the DataDictionary has to follow the reading/writing protocol in order to ensure the correct operation of the caches. This protocol involves calling {{{startReading}}} / {{{doneReading}}} when reading information from the DataDictionary, and calling {{{startWriting}}} / {{{doneWriting}}} when updating the database schema.

For example, {{{CreateTableConstantAction}} calls {{{startWriting}}} when it is creating a new table in the database. It then generates a new {{{TableDescriptor}}} for the new table and calls {{{addDescriptor}}} to add the information about the table to the SystemTables.

In general, the DataDictionary caching mechanism is trouble-free and efficient. However, at times it may be useful to understand its operation, both for performance reasons and for debugging reasons.

For performance analysis, the DataDictionary cache has the following properties:
 * it consumes a certain amount of memory
 * accessing SystemTables information from the cache is vastly more efficient than reading it from the real tables.
 * DDL statements cause the cache(s) to be flushed
 * a DDL statement in a transaction effectively disables the cache for *all* users until the transaction commits

For debugging, consider the following example: DERBY-1724 is an interesting case of a situation in which DataDictionary caching plays a role. DERBY-1724 was a manifestation of DERBY-1583, which was an underlying bug involving an incorrect assumption about the {{{ColumnDescriptor}}} object. A {{{ColumnDescriptor}}} object may or may not have an internal pointer to a corresponding {{{TableDescriptor}}} object. When a {{{ColumnDescriptor}}} is first created by {{{SYSCOLUMNSRowFactory}}}, it does not have a {{{TableDescriptor}}} pointer. This is because not all {{{ColumnDescriptor}}} objects are necessarily tied to particular tables; some may be expressions computed at runtime, for instance. At the point where {{{FromBaseTable}}} determines that a particular {{{ColumnDescriptor}}} is definitely tied to a particular {{{TableDescriptor}}}, it sets the table descriptor pointer in that {{{ColumnDescriptor}}}. Since {{{ColumnDescriptor}}} objects are cached, this updated object remain
 s in memory for subsequent use. This means that code which uses the {{{ColumnDescriptor}}} may or may not find that the table descriptor pointer has already been set, depending on whether or not the cache has managed to retain the descriptor in memory since the pointer was set. And, to close the chain of logic, the DERBY-1724 bug script contains a DDL statement (GRANT) in a transaction, which causes the cache to be disabled and thus enables the conditions for the DERBY-1583 bug to be triggered.