You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Mike Matrigali (JIRA)" <ji...@apache.org> on 2007/01/03 01:02:27 UTC

[jira] Commented: (DERBY-2168) Create new row format for derby to optimize access to columns within a row

    [ https://issues.apache.org/jira/browse/DERBY-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461872 ] 

Mike Matrigali commented on DERBY-2168:
---------------------------------------

There are many different approaches.  If I were doing this project I would avoid the upgrade issues by supporting both formats in 
some form.  The easiest would probably be at the container level - ie. only new tables/indexes have the new format - soft upgrade
would not support the new format.  Hard upgrade would allow the new format but would be able to read/write the old format.   

With more work one could probably upgrade automatically a page at a time, I think there is enough in the page header to support
on the fly upgrade - ie. only upgrade as you write new pages.  

> Create new row format for derby to optimize access to columns within a row
> --------------------------------------------------------------------------
>
>                 Key: DERBY-2168
>                 URL: https://issues.apache.org/jira/browse/DERBY-2168
>             Project: Derby
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 10.3.0.0
>            Reporter: Mike Matrigali
>            Priority: Minor
>
> The current (and only) low level row format for derby was chosen to at the beginning of the project to be the most flexible.  So it treats every
> column as variable length.  The simple row format is just a sequence of columns, with each column having a header indicating how long it
> is.  So there is  no way to determine where the N'th column is in the row unless it first traverses the N-1 columns before
> it.  A number of queries that might benefit from a different row format include:
> 1) non-covered queries which don't require all columns of data
> 2) non index scans which disqualify a number of rows based on a subset of columns that don't happen to be the 1st N columns of the row.
> A pretty standard row format would have some sort of table at the beginning which would allow one to jump to a given offset of the row without
> going through all the other columns.  Building up this table would likely increase the insert cost slightly, and would increase the diskspace required
> to store rows.
> Another standard kind of row format would be to optimize the  storage of fixed length fields.  Currently the store does not know anything about fixed
> length fields as each datatype controls it's own storage.  New interfaces could be added either at create time or maybe in the datatypes themselves
> to export the knowledge that datatypes are fixed length.  
> This is a big project.  Note that a lot of performance work in StoredPage has made it "know" about the current record and field formats, as it was 
> a big performance hit to make class calls for every field traversal.  This means that adding a new record and/or field format is not as isolated as
> one might hope.  Also we are likely to need to support both the old and new format.  Anyone considering this work, I would suggest a very rough
> prototype with peformance measurement first to make sure you are getting the expected performance before  doing a lot of work.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira