You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2011/03/15 10:23:29 UTC

[jira] Created: (PDFBOX-979) errors in %%EOF handling (fix included)

errors in %%EOF handling (fix included)
---------------------------------------

                 Key: PDFBOX-979
                 URL: https://issues.apache.org/jira/browse/PDFBOX-979
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.6.0
            Reporter: Timo Boehme


The '%%EOF' handling in PDFParser has several errors. The current implementation (start from line 467):

                String eof = "";
                if(!pdfSource.isEOF())
                    readLine(); // if there's more data to read, get the EOF flag
                
                // verify that EOF exists
                if("%%EOF".equals(eof)) {
                    // PDF does not conform to spec, we should warn someone
                    log.warn("expected='%%EOF' actual='" + eof + "'");
                    // if we're not at the end of a file, just put it back and move on
                    if(!pdfSource.isEOF())
                        pdfSource.unread(eof.getBytes("ISO-8859-1"));
                }

The problems:
- eof variable gets no value
- comparison if("%%EOF".equals(eof)) must be negated
- unreading must first add a newline or space byte because we read with readline() (like in bug PDFBOX-978)

Corrected version:
                String eof = "";
                if(!pdfSource.isEOF())
                    eof = readLine(); // if there's more data to read, get the EOF flag
                
                // verify that EOF exists
                if(!"%%EOF".equals(eof)) {
                    // PDF does not conform to spec, we should warn someone
                    log.warn("expected='%%EOF' actual='" + eof + "'");
                    // if we're not at the end of a file, just put it back and move on
                    if(!pdfSource.isEOF()) {
                      	pdfSource.unread( SPACE_BYTE );	// we read a whole line; add space as newline replacement
                        pdfSource.unread(eof.getBytes("ISO-8859-1"));
                    }
                }


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)

Posted by Timo Boehme <ti...@ontochem.com>.

15.03.2011 18:16, Adam@swmc.com:
> Does the patch from PDFBOX-908[1] fix this?  I reviewed that patch a while
> ago but didn't have time to test it myself.  I don't normally commit
> things without checking them myself, but if you can confirm that works,
> I'll get it committed to the trunk.
>
> [1] https://issues.apache.org/jira/browse/PDFBOX-908

No, it does not. The problem is in PDFParser and 908 deals only with 
'endobj' and object start. My bug report (and fix) deals with %%EOF 
handling which currently is broken (however in most cases it does no 
harm since the information if we read %%EOF is only used to decide if an 
exception is thrown).

PDFBOX-908 seems to be applied already (at least partially). It might 
suffer from the same problem as I've reported in PDFBOX-978[1] - a lost 
newline after unreading.

[1] https://issues.apache.org/jira/browse/PDFBOX-978


Timo

>
> From:
> "Timo Boehme (JIRA)"<ji...@apache.org>
> To:
> dev@pdfbox.apache.org
> Date:
> 03/15/2011 02:37
> Subject:
> [jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)
>
>
>
>
>      [
> https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006855#comment-13006855
> ]
>
> Timo Boehme commented on PDFBOX-979:
> ------------------------------------
>
> I have some bogus PDF files where content starts immediately after
> '%%EOF':
>
> startxref
> 302041
> %%EOF333 0 obj<</Length 15/Root
>
> In order to handle it like in the 'endobj' case I test if we start with
> '%%EOF' and unread all following content.
> New fixed version:
>
>                  String eof = "";
>                  if(!pdfSource.isEOF())
>                      eof = readLine(); // if there's more data to read, get
> the EOF flag
>
>                  // verify that EOF exists
>                  if(!"%%EOF".equals(eof)) {
>                                     if( eof.startsWith( "%%EOF" ) ) {
>                                                   // content after marker
> ->  unread with first space byte for read newline
>                                                   pdfSource.unread(
> SPACE_BYTE );            // we read a whole line; add space as newline
> replacement
>                                                   pdfSource.unread(
> eof.substring( 5 ).getBytes("ISO-8859-1") );
>                                     } else {
>                                       // PDF does not conform to spec, we
> should warn someone
>                                       log.warn("expected='%%EOF' actual='"
> + eof + "'");
>                                       // if we're not at the end of a file,
> just put it back and move on
>                                       if(!pdfSource.isEOF()) {
>                                                   pdfSource.unread(
> SPACE_BYTE );            // we read a whole line; add space as newline
> replacement
>   pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                                       }
>                                     }
>                  }
>
>
>> errors in %%EOF handling (fix included)
>> ---------------------------------------
>>
>>                  Key: PDFBOX-979
>>                  URL: https://issues.apache.org/jira/browse/PDFBOX-979
>>              Project: PDFBox
>>           Issue Type: Bug
>>           Components: Parsing
>>     Affects Versions: 1.6.0
>>             Reporter: Timo Boehme
>>
>> The '%%EOF' handling in PDFParser has several errors. The current
> implementation (start from line 467):
>>                  String eof = "";
>>                  if(!pdfSource.isEOF())
>>                      readLine(); // if there's more data to read, get the
> EOF flag
>>
>>                  // verify that EOF exists
>>                  if("%%EOF".equals(eof)) {
>>                      // PDF does not conform to spec, we should warn
> someone
>>                      log.warn("expected='%%EOF' actual='" + eof + "'");
>>                      // if we're not at the end of a file, just put it
> back and move on
>>                      if(!pdfSource.isEOF())
>>                          pdfSource.unread(eof.getBytes("ISO-8859-1"));
>>                  }
>> The problems:
>> - eof variable gets no value
>> - comparison if("%%EOF".equals(eof)) must be negated
>> - unreading must first add a newline or space byte because we read with
> readline() (like in bug PDFBOX-978)
>> Corrected version:
>>                  String eof = "";
>>                  if(!pdfSource.isEOF())
>>                      eof = readLine(); // if there's more data to read,
> get the EOF flag
>>
>>                  // verify that EOF exists
>>                  if(!"%%EOF".equals(eof)) {
>>                      // PDF does not conform to spec, we should warn
> someone
>>                      log.warn("expected='%%EOF' actual='" + eof + "'");
>>                      // if we're not at the end of a file, just put it
> back and move on
>>                      if(!pdfSource.isEOF()) {
>>                                         pdfSource.unread( SPACE_BYTE );
> // we read a whole line; add space as newline replacement
>>                          pdfSource.unread(eof.getBytes("ISO-8859-1"));
>>                      }
>>                  }
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
>
>
> - FHA 203b; 203k; HECM; VA; USDA; Conventional
> - Warehouse Lines; FHA-Authorized Originators
> - Lending and Servicing in over 45 States
> www.swmc.com   -  www.simplehecmcalculator.com
> Visit  www.swmc.com/resources   for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions
>
> This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.


-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780472
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________

Re: Replying to Jira notifications (Was: [jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included))

Posted by Ad...@swmc.com.

Didn't the old version of JIRA integrate with the mailing list?  I 
remember being surprised that e-mails were magically posted as comments 
before...

Either way, now that I know they're not connected, I'll make sure to 
comment via the web instead of e-mail.

---- 
Thanks,
Adam





From:
Jukka Zitting <jz...@adobe.com>
To:
"dev@pdfbox.apache.org" <de...@pdfbox.apache.org>
Date:
03/16/2011 03:58
Subject:
Replying to Jira notifications (Was: [jira] Commented: (PDFBOX-979) errors 
in %%EOF handling (fix included))



Hi,

In general I think it's better to comment on issue on the issue tracker 
itself instead of replying to notifications on the mailing list.

Comments in Jira will get posted to the mailing list for all to read, 
and having all comments in Jira makes it easier to later review the full 
communication history related to any particular issue.

BR,

Jukka Zitting

On 03/15/2011 06:16 PM, Adam@swmc.com wrote:
> Does the patch from PDFBOX-908[1] fix this?  I reviewed that patch a 
while
> ago but didn't have time to test it myself.  I don't normally commit
> things without checking them myself, but if you can confirm that works,
> I'll get it committed to the trunk.
>
> [1] https://issues.apache.org/jira/browse/PDFBOX-908
>
> ----
> Thanks,
> Adam
>
>
>
>
>
> From:
> "Timo Boehme (JIRA)"<ji...@apache.org>
> To:
> dev@pdfbox.apache.org
> Date:
> 03/15/2011 02:37
> Subject:
> [jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)
>
>
>
>
>      [
> 
https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006855#comment-13006855

> ]
>
> Timo Boehme commented on PDFBOX-979:
> ------------------------------------
>
> I have some bogus PDF files where content starts immediately after
> '%%EOF':
>
> startxref
> 302041
> %%EOF333 0 obj<</Length 15/Root
>
> In order to handle it like in the 'endobj' case I test if we start with
> '%%EOF' and unread all following content.
> New fixed version:
>
>                  String eof = "";
>                  if(!pdfSource.isEOF())
>                      eof = readLine(); // if there's more data to read, 
get
> the EOF flag
>
>                  // verify that EOF exists
>                  if(!"%%EOF".equals(eof)) {
>                                     if( eof.startsWith( "%%EOF" ) ) {
>                                                   // content after 
marker
> ->  unread with first space byte for read newline
>                                                   pdfSource.unread(
> SPACE_BYTE );            // we read a whole line; add space as newline
> replacement
>                                                   pdfSource.unread(
> eof.substring( 5 ).getBytes("ISO-8859-1") );
>                                     } else {
>                                       // PDF does not conform to spec, 
we
> should warn someone
>                                       log.warn("expected='%%EOF' 
actual='"
> + eof + "'");
>                                       // if we're not at the end of a 
file,
> just put it back and move on
>                                       if(!pdfSource.isEOF()) {
>                                                   pdfSource.unread(
> SPACE_BYTE );            // we read a whole line; add space as newline
> replacement
>   pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                                       }
>                                     }
>                  }
>
>
>> errors in %%EOF handling (fix included)
>> ---------------------------------------
>>
>>                  Key: PDFBOX-979
>>                  URL: https://issues.apache.org/jira/browse/PDFBOX-979
>>              Project: PDFBox
>>           Issue Type: Bug
>>           Components: Parsing
>>     Affects Versions: 1.6.0
>>             Reporter: Timo Boehme
>>
>> The '%%EOF' handling in PDFParser has several errors. The current
> implementation (start from line 467):
>>                  String eof = "";
>>                  if(!pdfSource.isEOF())
>>                      readLine(); // if there's more data to read, get 
the
> EOF flag
>>
>>                  // verify that EOF exists
>>                  if("%%EOF".equals(eof)) {
>>                      // PDF does not conform to spec, we should warn
> someone
>>                      log.warn("expected='%%EOF' actual='" + eof + "'");
>>                      // if we're not at the end of a file, just put it
> back and move on
>>                      if(!pdfSource.isEOF())
>>                          pdfSource.unread(eof.getBytes("ISO-8859-1"));
>>                  }
>> The problems:
>> - eof variable gets no value
>> - comparison if("%%EOF".equals(eof)) must be negated
>> - unreading must first add a newline or space byte because we read with
> readline() (like in bug PDFBOX-978)
>> Corrected version:
>>                  String eof = "";
>>                  if(!pdfSource.isEOF())
>>                      eof = readLine(); // if there's more data to read,
> get the EOF flag
>>
>>                  // verify that EOF exists
>>                  if(!"%%EOF".equals(eof)) {
>>                      // PDF does not conform to spec, we should warn
> someone
>>                      log.warn("expected='%%EOF' actual='" + eof + "'");
>>                      // if we're not at the end of a file, just put it
> back and move on
>>                      if(!pdfSource.isEOF()) {
>>                                         pdfSource.unread( SPACE_BYTE );
> // we read a whole line; add space as newline replacement
>>                          pdfSource.unread(eof.getBytes("ISO-8859-1"));
>>                      }
>>                  }
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: 
http://www.atlassian.com/software/jira
>
>
>
>
>
> - FHA 203b; 203k; HECM; VA; USDA; Conventional
> - Warehouse Lines; FHA-Authorized Originators
> - Lending and Servicing in over 45 States
> www.swmc.com   -  www.simplehecmcalculator.com
> Visit  www.swmc.com/resources   for helpful links on Training, Webinars, 
Lender Alerts and Submitting Conditions
>
> This email and any content within or attached hereto from Sun West 
Mortgage Company, Inc. is confidential and/or legally privileged. The 
information is intended only for the use of the individual or entity named 
on this email. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or taking any action 
in reliance on the contents of this email information is strictly 
prohibited, and that the documents should be returned to this office 
immediately by email. Receipt by anyone other than the intended recipient 
is not a waiver of any privilege. Please do not include your social 
security number, account number, or any other personal or financial 
information in the content of the email. Should you have any questions, 
please call (800) 453 7884.


-- 
Jukka Zitting





- FHA 203b; 203k; HECM; VA; USDA; Conventional 
- Warehouse Lines; FHA-Authorized Originators 
- Lending and Servicing in over 45 States 
www.swmc.com   -  www.simplehecmcalculator.com   
Visit  www.swmc.com/resources   for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions  

This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.

Replying to Jira notifications (Was: [jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included))

Posted by Jukka Zitting <jz...@adobe.com>.

Hi,

In general I think it's better to comment on issue on the issue tracker 
itself instead of replying to notifications on the mailing list.

Comments in Jira will get posted to the mailing list for all to read, 
and having all comments in Jira makes it easier to later review the full 
communication history related to any particular issue.

BR,

Jukka Zitting

On 03/15/2011 06:16 PM, Adam@swmc.com wrote:
> Does the patch from PDFBOX-908[1] fix this?  I reviewed that patch a while
> ago but didn't have time to test it myself.  I don't normally commit
> things without checking them myself, but if you can confirm that works,
> I'll get it committed to the trunk.
>
> [1] https://issues.apache.org/jira/browse/PDFBOX-908
>
> ----
> Thanks,
> Adam
>
>
>
>
>
> From:
> "Timo Boehme (JIRA)"<ji...@apache.org>
> To:
> dev@pdfbox.apache.org
> Date:
> 03/15/2011 02:37
> Subject:
> [jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)
>
>
>
>
>      [
> https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006855#comment-13006855
> ]
>
> Timo Boehme commented on PDFBOX-979:
> ------------------------------------
>
> I have some bogus PDF files where content starts immediately after
> '%%EOF':
>
> startxref
> 302041
> %%EOF333 0 obj<</Length 15/Root
>
> In order to handle it like in the 'endobj' case I test if we start with
> '%%EOF' and unread all following content.
> New fixed version:
>
>                  String eof = "";
>                  if(!pdfSource.isEOF())
>                      eof = readLine(); // if there's more data to read, get
> the EOF flag
>
>                  // verify that EOF exists
>                  if(!"%%EOF".equals(eof)) {
>                                     if( eof.startsWith( "%%EOF" ) ) {
>                                                   // content after marker
> ->  unread with first space byte for read newline
>                                                   pdfSource.unread(
> SPACE_BYTE );            // we read a whole line; add space as newline
> replacement
>                                                   pdfSource.unread(
> eof.substring( 5 ).getBytes("ISO-8859-1") );
>                                     } else {
>                                       // PDF does not conform to spec, we
> should warn someone
>                                       log.warn("expected='%%EOF' actual='"
> + eof + "'");
>                                       // if we're not at the end of a file,
> just put it back and move on
>                                       if(!pdfSource.isEOF()) {
>                                                   pdfSource.unread(
> SPACE_BYTE );            // we read a whole line; add space as newline
> replacement
>   pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                                       }
>                                     }
>                  }
>
>
>> errors in %%EOF handling (fix included)
>> ---------------------------------------
>>
>>                  Key: PDFBOX-979
>>                  URL: https://issues.apache.org/jira/browse/PDFBOX-979
>>              Project: PDFBox
>>           Issue Type: Bug
>>           Components: Parsing
>>     Affects Versions: 1.6.0
>>             Reporter: Timo Boehme
>>
>> The '%%EOF' handling in PDFParser has several errors. The current
> implementation (start from line 467):
>>                  String eof = "";
>>                  if(!pdfSource.isEOF())
>>                      readLine(); // if there's more data to read, get the
> EOF flag
>>
>>                  // verify that EOF exists
>>                  if("%%EOF".equals(eof)) {
>>                      // PDF does not conform to spec, we should warn
> someone
>>                      log.warn("expected='%%EOF' actual='" + eof + "'");
>>                      // if we're not at the end of a file, just put it
> back and move on
>>                      if(!pdfSource.isEOF())
>>                          pdfSource.unread(eof.getBytes("ISO-8859-1"));
>>                  }
>> The problems:
>> - eof variable gets no value
>> - comparison if("%%EOF".equals(eof)) must be negated
>> - unreading must first add a newline or space byte because we read with
> readline() (like in bug PDFBOX-978)
>> Corrected version:
>>                  String eof = "";
>>                  if(!pdfSource.isEOF())
>>                      eof = readLine(); // if there's more data to read,
> get the EOF flag
>>
>>                  // verify that EOF exists
>>                  if(!"%%EOF".equals(eof)) {
>>                      // PDF does not conform to spec, we should warn
> someone
>>                      log.warn("expected='%%EOF' actual='" + eof + "'");
>>                      // if we're not at the end of a file, just put it
> back and move on
>>                      if(!pdfSource.isEOF()) {
>>                                         pdfSource.unread( SPACE_BYTE );
> // we read a whole line; add space as newline replacement
>>                          pdfSource.unread(eof.getBytes("ISO-8859-1"));
>>                      }
>>                  }
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
>
>
> - FHA 203b; 203k; HECM; VA; USDA; Conventional
> - Warehouse Lines; FHA-Authorized Originators
> - Lending and Servicing in over 45 States
> www.swmc.com   -  www.simplehecmcalculator.com
> Visit  www.swmc.com/resources   for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions
>
> This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.


-- 
Jukka Zitting

Re: [jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)

Posted by Ad...@swmc.com.

Does the patch from PDFBOX-908[1] fix this?  I reviewed that patch a while 
ago but didn't have time to test it myself.  I don't normally commit 
things without checking them myself, but if you can confirm that works, 
I'll get it committed to the trunk.

[1] https://issues.apache.org/jira/browse/PDFBOX-908

---- 
Thanks,
Adam





From:
"Timo Boehme (JIRA)" <ji...@apache.org>
To:
dev@pdfbox.apache.org
Date:
03/15/2011 02:37
Subject:
[jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)




    [ 
https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006855#comment-13006855 
] 

Timo Boehme commented on PDFBOX-979:
------------------------------------

I have some bogus PDF files where content starts immediately after 
'%%EOF':

startxref
302041
%%EOF333 0 obj<</Length 15/Root

In order to handle it like in the 'endobj' case I test if we start with 
'%%EOF' and unread all following content.
New fixed version:

                String eof = "";
                if(!pdfSource.isEOF())
                    eof = readLine(); // if there's more data to read, get 
the EOF flag
 
                // verify that EOF exists
                if(!"%%EOF".equals(eof)) {
                                   if( eof.startsWith( "%%EOF" ) ) {
                                                 // content after marker 
-> unread with first space byte for read newline
                                                 pdfSource.unread( 
SPACE_BYTE );            // we read a whole line; add space as newline 
replacement
                                                 pdfSource.unread( 
eof.substring( 5 ).getBytes("ISO-8859-1") );
                                   } else {
                                     // PDF does not conform to spec, we 
should warn someone
                                     log.warn("expected='%%EOF' actual='" 
+ eof + "'");
                                     // if we're not at the end of a file, 
just put it back and move on
                                     if(!pdfSource.isEOF()) {
                                                 pdfSource.unread( 
SPACE_BYTE );            // we read a whole line; add space as newline 
replacement
 pdfSource.unread(eof.getBytes("ISO-8859-1"));
                                     }
                                   }
                }


> errors in %%EOF handling (fix included)
> ---------------------------------------
>
>                 Key: PDFBOX-979
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-979
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>            Reporter: Timo Boehme
>
> The '%%EOF' handling in PDFParser has several errors. The current 
implementation (start from line 467):
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     readLine(); // if there's more data to read, get the 
EOF flag
> 
>                 // verify that EOF exists
>                 if("%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn 
someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it 
back and move on
>                     if(!pdfSource.isEOF())
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                 }
> The problems:
> - eof variable gets no value
> - comparison if("%%EOF".equals(eof)) must be negated
> - unreading must first add a newline or space byte because we read with 
readline() (like in bug PDFBOX-978)
> Corrected version:
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     eof = readLine(); // if there's more data to read, 
get the EOF flag
> 
>                 // verify that EOF exists
>                 if(!"%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn 
someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it 
back and move on
>                     if(!pdfSource.isEOF()) {
>                                        pdfSource.unread( SPACE_BYTE );   
// we read a whole line; add space as newline replacement
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                     }
>                 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira





- FHA 203b; 203k; HECM; VA; USDA; Conventional 
- Warehouse Lines; FHA-Authorized Originators 
- Lending and Servicing in over 45 States 
www.swmc.com   -  www.simplehecmcalculator.com   
Visit  www.swmc.com/resources   for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions  

This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.

[jira] [Commented] (PDFBOX-979) errors in %%EOF handling (fix included)

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009196#comment-13009196 ] 

Adam Nichols commented on PDFBOX-979:
-------------------------------------

Committed correct patch in revision 1083857.  Thanks for the heads up.

> errors in %%EOF handling (fix included)
> ---------------------------------------
>
>                 Key: PDFBOX-979
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-979
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>            Reporter: Timo Boehme
>            Assignee: Adam Nichols
>             Fix For: 1.6.0
>
>
> The '%%EOF' handling in PDFParser has several errors. The current implementation (start from line 467):
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if("%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF())
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                 }
> The problems:
> - eof variable gets no value
> - comparison if("%%EOF".equals(eof)) must be negated
> - unreading must first add a newline or space byte because we read with readline() (like in bug PDFBOX-978)
> Corrected version:
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     eof = readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if(!"%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF()) {
>                       	pdfSource.unread( SPACE_BYTE );	// we read a whole line; add space as newline replacement
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                     }
>                 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)

Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006855#comment-13006855 ] 

Timo Boehme commented on PDFBOX-979:
------------------------------------

I have some bogus PDF files where content starts immediately after '%%EOF':

startxref
302041
%%EOF333 0 obj<</Length 15/Root

In order to handle it like in the 'endobj' case I test if we start with '%%EOF' and unread all following content.
New fixed version:

                String eof = "";
                if(!pdfSource.isEOF())
                    eof = readLine(); // if there's more data to read, get the EOF flag
                
                // verify that EOF exists
                if(!"%%EOF".equals(eof)) {
                	  if( eof.startsWith( "%%EOF" ) ) {
                	  	// content after marker -> unread with first space byte for read newline
                    		pdfSource.unread( SPACE_BYTE );	// we read a whole line; add space as newline replacement
                    		pdfSource.unread( eof.substring( 5 ).getBytes("ISO-8859-1") );
                	  } else {
	                    // PDF does not conform to spec, we should warn someone
	                    log.warn("expected='%%EOF' actual='" + eof + "'");
	                    // if we're not at the end of a file, just put it back and move on
	                    if(!pdfSource.isEOF()) {
	                      	pdfSource.unread( SPACE_BYTE );	// we read a whole line; add space as newline replacement
	                        pdfSource.unread(eof.getBytes("ISO-8859-1"));
	                    }
                	  }
                }


> errors in %%EOF handling (fix included)
> ---------------------------------------
>
>                 Key: PDFBOX-979
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-979
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>            Reporter: Timo Boehme
>
> The '%%EOF' handling in PDFParser has several errors. The current implementation (start from line 467):
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if("%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF())
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                 }
> The problems:
> - eof variable gets no value
> - comparison if("%%EOF".equals(eof)) must be negated
> - unreading must first add a newline or space byte because we read with readline() (like in bug PDFBOX-978)
> Corrected version:
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     eof = readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if(!"%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF()) {
>                       	pdfSource.unread( SPACE_BYTE );	// we read a whole line; add space as newline replacement
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                     }
>                 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PDFBOX-979) errors in %%EOF handling (fix included)

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007540#comment-13007540 ] 

Adam Nichols commented on PDFBOX-979:
-------------------------------------

Patch committed in revision 1082193.  Thanks for the contribution :-)

> errors in %%EOF handling (fix included)
> ---------------------------------------
>
>                 Key: PDFBOX-979
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-979
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>            Reporter: Timo Boehme
>             Fix For: 1.6.0
>
>
> The '%%EOF' handling in PDFParser has several errors. The current implementation (start from line 467):
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if("%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF())
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                 }
> The problems:
> - eof variable gets no value
> - comparison if("%%EOF".equals(eof)) must be negated
> - unreading must first add a newline or space byte because we read with readline() (like in bug PDFBOX-978)
> Corrected version:
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     eof = readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if(!"%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF()) {
>                       	pdfSource.unread( SPACE_BYTE );	// we read a whole line; add space as newline replacement
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                     }
>                 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-979) errors in %%EOF handling (fix included)

Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009057#comment-13009057 ] 

Timo Boehme commented on PDFBOX-979:
------------------------------------

Thanks for committing however the commit is not correct. First the SPACE_BYTE must be unread before the other bytes get unread (the patched version has the SPACE_BYTE unread after). Second it would be better if the patch from my comment would be applied which handles the problem described in the comment too.

> errors in %%EOF handling (fix included)
> ---------------------------------------
>
>                 Key: PDFBOX-979
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-979
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>            Reporter: Timo Boehme
>            Assignee: Adam Nichols
>             Fix For: 1.6.0
>
>
> The '%%EOF' handling in PDFParser has several errors. The current implementation (start from line 467):
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if("%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF())
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                 }
> The problems:
> - eof variable gets no value
> - comparison if("%%EOF".equals(eof)) must be negated
> - unreading must first add a newline or space byte because we read with readline() (like in bug PDFBOX-978)
> Corrected version:
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     eof = readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if(!"%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF()) {
>                       	pdfSource.unread( SPACE_BYTE );	// we read a whole line; add space as newline replacement
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                     }
>                 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (PDFBOX-979) errors in %%EOF handling (fix included)

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Nichols resolved PDFBOX-979.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 1.6.0
         Assignee: Adam Nichols

> errors in %%EOF handling (fix included)
> ---------------------------------------
>
>                 Key: PDFBOX-979
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-979
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>            Reporter: Timo Boehme
>            Assignee: Adam Nichols
>             Fix For: 1.6.0
>
>
> The '%%EOF' handling in PDFParser has several errors. The current implementation (start from line 467):
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if("%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF())
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                 }
> The problems:
> - eof variable gets no value
> - comparison if("%%EOF".equals(eof)) must be negated
> - unreading must first add a newline or space byte because we read with readline() (like in bug PDFBOX-978)
> Corrected version:
>                 String eof = "";
>                 if(!pdfSource.isEOF())
>                     eof = readLine(); // if there's more data to read, get the EOF flag
>                 
>                 // verify that EOF exists
>                 if(!"%%EOF".equals(eof)) {
>                     // PDF does not conform to spec, we should warn someone
>                     log.warn("expected='%%EOF' actual='" + eof + "'");
>                     // if we're not at the end of a file, just put it back and move on
>                     if(!pdfSource.isEOF()) {
>                       	pdfSource.unread( SPACE_BYTE );	// we read a whole line; add space as newline replacement
>                         pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                     }
>                 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira