You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2009/10/07 00:48:52 UTC

DO NOT REPLY [Bug 47950] New: No case insensitivity handling for OLE2 entry names

https://issues.apache.org/bugzilla/show_bug.cgi?id=47950

           Summary: No case insensitivity handling for OLE2 entry names
           Product: POI
           Version: 3.5-FINAL
          Platform: PC
        OS/Version: Windows NT
            Status: NEW
          Severity: normal
          Priority: P2
         Component: POIFS
        AssignedTo: dev@poi.apache.org
        ReportedBy: trejkaz@trypticon.org


I created some test cases to test case sensitivity in OLE2 files.

    @Test
    public void testPoiCaseInsensitivityInMemory() throws Exception
    {
        POIFSFileSystem fs = new POIFSFileSystem();
        DirectoryEntry dir = fs.getRoot().createDirectory("A");
        dir.createDocument("B", new ByteArrayInputStream(new byte[] { 0, 1, 2,
3, 4, 5 }));

        DirectoryEntry dir2 = (DirectoryEntry) fs.getRoot().getEntry("a");
        DocumentEntry doc2 = (DocumentEntry) dir2.getEntry("b");
        assertArrayEquals("Wrong data read back", new byte[] { 0, 1, 2, 3, 4, 5
},
                          IOUtils.toByteArray(new DocumentInputStream(doc2)));
    }

    @Test
    public void testPoiCaseInsensitivityAfterReadingFromStorage() throws
Exception
    {
        POIFSFileSystem fs = new POIFSFileSystem();
        DirectoryEntry dir = fs.getRoot().createDirectory("A");
        dir.createDocument("B", new ByteArrayInputStream(new byte[] { 0, 1, 2,
3, 4, 5 }));

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        fs.writeFilesystem(baos);

        POIFSFileSystem fs2 = new POIFSFileSystem(new
ByteArrayInputStream(baos.toByteArray()));
        DirectoryEntry dir2 = (DirectoryEntry) fs2.getRoot().getEntry("a");
        DocumentEntry doc2 = (DocumentEntry) dir2.getEntry("b");
        assertArrayEquals("Wrong data read back", new byte[] { 0, 1, 2, 3, 4, 5
},
                          IOUtils.toByteArray(new DocumentInputStream(doc2)));
    }

Both of these fail looking up "a" as it doesn't exist, but the comparison is
supposed to be case insensitive according to available documentation.

Specifically, [MS-CFB] has the following to say about how entries in an OLE2
directory should be compared:

(2.6.1 pg 23)

When locating an object in the compound file except for the root storage, the
directory entry name is compared using a special case-insensitive upper-case
mapping, described in Red-Black Tree.

(2.6.4 "Red-Black Tree" pg 26)

  * For each UTF-16 code point, convert to upper-case with the Unicode Default
Case Conversion
    Algorithm, simple case conversion variant (simple case foldings), with the
following notes.<2> 

  *  Unicode surrogate characters are never upper-cased, since they are
represented by two UTF-16
     code points, while the sorting relationship upper-cases a single UTF-16
code point at a time.

  * Lowercase characters defined in a newer, later version of the Unicode
standard can be upper-
    cased by an implementation that conforms to that later Unicode standard.

Note <2> goes into further detail on which version of Unicode is used to
perform the folding:

(pg 39)

For Windows XP and Windows Server 2003: The compound file implementation
conforms to the Unicode 3.0.1 Default Case Conversion Algorithm, simple case
folding (http://www.unicode.org/Public/3.1-Update1/CaseFolding-4.txt) with the
following exceptions.
(table omitted for now)
For Windows Vista and Windows Server 2008: The compound files implementation
conforms to the Unicode 5.0 Default Case Conversion Algorithm, simple case
folding (http://www.unicode.org/Public/5.0.0/ucd/CaseFolding.txt) with the
following exceptions.
(table omitted for now)


References:

[MS-CFB]: Compound File Binary File Format, Revision 0.01 (Wednesday, June 18,
2008)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org