You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2008/11/16 23:05:11 UTC

DO NOT REPLY [Bug 46220] New: Regression: Some embedded images being lost

https://issues.apache.org/bugzilla/show_bug.cgi?id=46220

           Summary: Regression: Some embedded images being lost
           Product: POI
           Version: 3.2-FINAL
          Platform: PC
        OS/Version: Windows Vista
            Status: NEW
          Severity: regression
          Priority: P2
         Component: HWPF
        AssignedTo: dev@poi.apache.org
        ReportedBy: trejkaz@trypticon.org


Some of our own test cases have been failing since an update from POI 3.1 to
3.2.

The embedded images now come out incorrectly or are not found at all.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 46220] Regression: Some embedded images being lost

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46220





--- Comment #1 from Trejkaz <tr...@trypticon.org>  2008-11-16 14:15:42 PST ---
Test file 1 (too big to attach. :-( )
http://dl.getdropbox.com/u/50201/nonconsecutive-images.doc

This one now gets 3 images instead of 4.

Correct MD5 digests for each image (first three do match up so the images it
does pick up must be okay.)

851be142bce6d01848e730cb6903f39e
7fc6d8fb58b09ababd036d10a0e8c039
a7dc644c40bc2fbf17b2b62d07f99248
72d07b8db5fad7099d90bc4c304b4666   <-- this is the missing one


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 46220] Regression: Some embedded images being lost

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46220





--- Comment #6 from Trejkaz <tr...@trypticon.org>  2008-11-19 15:01:12 PST ---
Actually for comparison, this is the same warnings I'm getting on the original
file in this case, i.e. the Unicode one isn't appearing for this test case,
only for the other ones (which I can't redistribute, because someone checked in
IP as test cases.) :-(


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 46220] Regression: Some embedded images being lost

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46220


Trejkaz <tr...@trypticon.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED




--- Comment #5 from Trejkaz <tr...@trypticon.org>  2008-11-19 14:59:23 PST ---
Same file as before, with one more image added in and the number edited from 4
to 5 in the document...

851be142bce6d01848e730cb6903f39e
7fc6d8fb58b09ababd036d10a0e8c039
a7dc644c40bc2fbf17b2b62d07f99248
5eee0af68b7856b731a7775db8a6e6e2
72d07b8db5fad7099d90bc4c304b4666 <-- missing (same image as before)

http://dl.getdropbox.com/u/50201/nonconsecutive-images-2.doc

Warnings:
A property claimed to start before zero, at -512! Resetting it to zero, and
hoping for the best
A property claimed to start before zero, at -512! Resetting it to zero, and
hoping for the best


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 46220] Regression: Some embedded images being lost

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46220





--- Comment #3 from Trejkaz <tr...@trypticon.org>  2008-11-17 17:38:35 PST ---
Any suggestions on how to go about this?  I tried doing what was suggested on
the list, and incrementally adding images to existing documents, but even for
"broken" ones, Word saved them in a form which fixed any warnings.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 46220] Regression: Some embedded images being lost

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46220





--- Comment #4 from Nick Burch <ni...@torchbox.com>  2008-11-19 04:49:23 PST ---
The fact that re-saving them in word fixes the warning does tend to make me
think these files really are slightly dodgy, and it isn't just us

First up, could you try older versions of word, to see if maybe there was one
earlier version which produced these dodgy files?

Otherwise, if you could find a couple of different files which do trigger the
warning, but not all have image issues, that'd be a big help. Especially if we
could have two versions of each, the original (with warning) and the newer word
re-saved version (without warnings). We can then compare them, see the
differences, and hopefully figure out what needs fixing/working around


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 46220] Regression: Some embedded images being lost

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46220


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO




--- Comment #2 from Nick Burch <ni...@torchbox.com>  2008-11-17 02:02:02 PST ---
I wonder if this is another area of hwpf that is assuming bytes/characters but
getting characters/bytes. (I tried to make it a bit more sane for 3.2, so that
unicode text extraction worked more reliably, but the file format it really
crazy about this sort of thing)

One test that'd be interesting is creating a few small test files with images,
and seeing how hwpf copes with them:
* non unicode, one image near start
* non unicode, one image near end
* non unicode, image near start, image near end
* unicode, one image near start
* unicode, one image near end
* unicode, image near start, image near end

If 5 and 6 have issues with their later images, we'll know it's another
byte/character problem. If it's something different, maybe it'll help us track
down.

As a bonus, the files will make a good regression test for the future :)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 46220] Regression: Some embedded images being lost

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46220

Yegor Kozlov <ye...@dinom.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #7 from Yegor Kozlov <ye...@dinom.ru> 2011-06-24 08:19:54 UTC ---
Images are properly read with current trunk. I included your sample in our
collection of test documents and added a unit test.

Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org