You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2011/06/03 19:44:01 UTC

DO NOT REPLY [Bug 51318] New: Exceptions in NDocumentInputStream preventing streaming of data out of MS Publisher files

https://issues.apache.org/bugzilla/show_bug.cgi?id=51318

             Bug #: 51318
           Summary: Exceptions in NDocumentInputStream preventing
                    streaming of data out of MS Publisher files
           Product: POI
           Version: 3.2-FINAL
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: critical
          Priority: P2
         Component: HPBF
        AssignedTo: dev@poi.apache.org
        ReportedBy: dgoldenberg@attivio.com
    Classification: Unclassified


Related to 51317 - Need ability to stream and chunk data out of MS Publisher
documents.

I attempted to implement streaming and chunking of data out of pub files and
got errors as below.

Basically I attempted to read from DocumentInputStream in chunks, in
succession, rather than read in the whole stream into a large preallocated byte
array.

    byte[] filler = new byte[25]; 

    byte[] bytes = new byte[8];
    int read = dis.read(bytes, 0, 8);

    if (read <= 0) {
      // 
    } else {
      String f8 = new String(bytes);
      if (!f8.equals("CHNKINK ")) {
        throw new IllegalArgumentException("Expecting 'CHNKINK ' but was '" +
f8 + "'");
      }
      // Ignore the next 24, for now at least

      dis.read(filler, 8, 24);

      for (int i = 0; i < 20; i++) {
        int offset = 0x20 + i * 24;

        bytes = new byte[25];
        read = dis.read(bytes, offset, bytes.length);

Note the line which attempts to read the filler 24 bytes so we can get to the
bits. I had to try it there because was getting error simply trying to do
read(bytes, offset, bytes.length).

Errors are all like this first:
Exception in thread "main" java.lang.IndexOutOfBoundsException: can't read past
buffer boundaries
    at
org.apache.poi.poifs.filesystem.NDocumentInputStream.read(NDocumentInputStream.java:142)
    at
org.apache.poi.poifs.filesystem.DocumentInputStream.read(DocumentInputStream.java:118)

Now, if we examine NDocumentInputStream.read(byte[], int, int), there is a
conditional there:
if (off < 0 || len < 0 || b.length < off + len) {

This assumes that the byte array is large and you're going in sequence. If you
want to jump around you'd presumably want to check b.length < len.

Tried that. Got the next error as follows:
Exception in thread "main" java.lang.IndexOutOfBoundsException
    at java.nio.Buffer.checkBounds(Buffer.java:530)
    at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:125)
    at
org.apache.poi.poifs.filesystem.NDocumentInputStream.readFully(NDocumentInputStream.java:250)
    at
org.apache.poi.poifs.filesystem.NDocumentInputStream.read(NDocumentInputStream.java:151)
    at
org.apache.poi.poifs.filesystem.DocumentInputStream.read(DocumentInputStream.java:118)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51318] Exceptions in NDocumentInputStream preventing streaming of data out of MS Publisher files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51318

--- Comment #3 from Dmitry Goldenberg <dg...@attivio.com> 2011-06-03 23:22:41 UTC ---
Created attachment 27107
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27107
Smaller file to repro on

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51318] Exceptions in NDocumentInputStream preventing streaming of data out of MS Publisher files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51318

--- Comment #5 from Nick Burch <ni...@alfresco.com> 2011-06-06 14:36:52 UTC ---
Can you try with svn trunk, and see if it helps? I fixed a few bits on the
weekend, and I updated most of the DocumentInputStream tests to check NPOIFS
too

There is a mark/reset issue though, need to fix that before I can write a test
for your specific case.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51318] Exceptions in NDocumentInputStream preventing streaming of data out of MS Publisher files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51318

Nick Burch <ni...@alfresco.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #2 from Nick Burch <ni...@alfresco.com> 2011-06-03 19:35:49 UTC ---
Does this happen even on small publisher files? I'm guessing it may affect
anything where an entry in a POIFS is more than one big block. If you can
reproduce it with one of the small sample files we already have, that'd mean we
could use them and make life easy :)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 51318] Exceptions in NDocumentInputStream preventing streaming of data out of MS Publisher files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51318

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|---                         |LATER

--- Comment #7 from Nick Burch <ap...@gagravarr.org> ---
The NIO 3.2 branch hasn't been worked on for quite some time, and isn't likely
to receive any new work. As such, I'm closing this as "In a Later Version",
sorry

If you can reproduce this problem still on 3.11, please let us know. As it
stands, a very similar unit test passes on trunk, and has done for some time,
so I think your problem is solved!

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51318] Exceptions in NDocumentInputStream preventing streaming of data out of MS Publisher files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51318

--- Comment #6 from Dmitry Goldenberg <dg...@attivio.com> 2011-06-06 14:40:49 UTC ---
I can certainly try. So is all this stuff going into trunk?
We were actually on NIO 3.2 branch...  Also would ideally like your other fix
too :)

Do you think you'll be looking into streaming API's for HPBF? Then perhaps we
could write the chunking on top of that...

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51318] Exceptions in NDocumentInputStream preventing streaming of data out of MS Publisher files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51318

--- Comment #1 from Dmitry Goldenberg <dg...@attivio.com> 2011-06-03 17:48:47 UTC ---
Attachment is too big to attached even in a zip. Please let me know if you want
the file. I suspect this will happen on many or most pub files.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51318] Exceptions in NDocumentInputStream preventing streaming of data out of MS Publisher files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51318

--- Comment #4 from Dmitry Goldenberg <dg...@attivio.com> 2011-06-03 23:23:52 UTC ---
Nick,

Yes, it's reproducible on smaller files (please see attached).

Thanks

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org