You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Rick Hillegas (JIRA)" <ji...@apache.org> on 2011/04/27 15:24:03 UTC
[jira] [Updated] (DERBY-5201) Create tools for reading the contents of the seg0 directory

     [ https://issues.apache.org/jira/browse/DERBY-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Hillegas updated DERBY-5201:
---------------------------------

    Attachment: DataFileReader.java

Attaching DataFileReader.java. This program reads a file in the seg0 directory and streams out its contents as human-readable xml. Verbose printing shows the column data as byte arrays. This can be refined further by giving the tool a row signature--if you do that, the column data is deserialized into objects and the toString() results are shown. The actual database is not booted or otherwise disturbed. A transient in-memory helper database is created if you want to deserialize the column contents.

More work could be done formatting overflow data.

I have run the tool on the SYSCONGOMERATES file (c20.dat) in databases created by 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, and 10.9 (trunk). I have also tested the tool on a file containing a UDT.

Here is the tool's usage message:

Usage:

    java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ]

    -v   Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do not set this flag, the tool just decodes the page headers.
    -d   Data signature. This makes a verbose printout turn the field data into objects. $D is a row signature, e.g., "( a int, b varchar( 30 ) )"
    -p   Starting page. $P is a number which must be at least 1, the first page to read after the header. Page 0 (the header) is always read.
    -n   Number of pages to read. $N is a positive number. Defaults to all subsequent pages.

  For example, the following command deserializes all of the records in the SYSCONGLOMERATES file:

    java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 )  )"

  Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects.

Here are some sample use cases:

-------------------------------------------

1) Decode an entire data file, putting the resulting xml in the file z.xml. You can then view that file using a browser like Firefox, which lets you collapse and expand the elements. 

java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 )  )" > z.xml


-------------------------------------------

2) Pretty-print the file header:

java DataFileReader db/seg0/c20.dat -n 1
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<dataFile>
    <fileHeader>
        <allocPage number="0">
            <formatableID>118</formatableID>
            <pageHeader>
                <isOverFlowPage>false</isOverFlowPage>
                <status hexvalue="1">
                    <flag>VALID_PAGE</flag>
                </status>
                <pageVersion>9</pageVersion>
                <slotsInUse>0</slotsInUse>
                <nextRecordID>6</nextRecordID>
                <pageGeneration>0</pageGeneration>
                <previousGeneration>0</previousGeneration>
                <beforeImagePageLocation>0</beforeImagePageLocation>
                <deletedRowCount>1</deletedRowCount>
            </pageHeader>
            <nextAllocPageNumber>-1</nextAllocPageNumber>
            <nextAllocPageOffset>0</nextAllocPageOffset>
            <containerInfoLength>80</containerInfoLength>
            <containerInfo>
                <formatableID>116</formatableID>
                <containerStatus hexvalue="0">
                </containerStatus>
                <pageSize>4096</pageSize>
                <spareSpace>0</spareSpace>
                <minimumRecordSize>12</minimumRecordSize>
                <initialPages>1</initialPages>
                <preAllocSize>8</preAllocSize>
                <firstAllocPageNumber>0</firstAllocPageNumber>
                <firstAllocPageOffset>0</firstAllocPageOffset>
                <containerVersion>0</containerVersion>
                <estimatedRowCount>71</estimatedRowCount>
                <reusableRecordIdSequenceNumber>0</reusableRecordIdSequenceNumber>
                <spare>0</spare>
                <checksum>2463908068</checksum>
            </containerInfo>
            <allocationExtent>
                <extentOffset>4096</extentOffset>
                <extentStart>1</extentStart>
                <extentEnd>10216</extentEnd>
                <extentLength>8</extentLength>
                <extentStatus hexvalue="30000010">
                    <flag>HAS_UNFILLED_PAGES</flag>
                    <flag>KEEP_UNFILLED_PAGES</flag>
                    <flag>NO_DEALLOC_PAGE_MAP</flag>
                </extentStatus>
                <preAllocLength>7</preAllocLength>
                <reserved1>0</reserved1>
                <reserved2>0</reserved2>
                <reserved3>0</reserved3>
                <freePages totalLength="8" bitsThatAreSet="0"/>
                <unFilledPages totalLength="8" bitsThatAreSet="1"/>
            </allocationExtent>
        </allocPage>
    </fileHeader>
    <pageCount>1</pageCount>
</dataFile>


-------------------------------------------

3) Count the number of pages in a data file:

java DataFileReader db/seg0/c20.dat | grep pageCount
    <pageCount>9</pageCount>


-------------------------------------------

4) Decode 3 pages, starting at page 2. This one is a little tricky because the header page is always decoded. So you need to ask for 4 pages (3 data pages plus 1 header page):

java DataFileReader db/seg0/c20.dat -v -p 4 -n 3


> Create tools for reading the contents of the seg0 directory
> -----------------------------------------------------------
>
>                 Key: DERBY-5201
>                 URL: https://issues.apache.org/jira/browse/DERBY-5201
>             Project: Derby
>          Issue Type: Task
>          Components: Tools
>    Affects Versions: 10.9.0.0
>            Reporter: Rick Hillegas
>         Attachments: DataFileReader.java
>
>
> It would be nice to have tools which read Derby data files (the files in the seg0 directory) without disturbing their contents.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira