You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2008/11/28 00:48:44 UTC

[jira] Commented: (PDFBOX-390) org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace

    [ https://issues.apache.org/jira/browse/PDFBOX-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651435#action_12651435 ] 

Jukka Zitting commented on PDFBOX-390:
--------------------------------------

Can you attach a patch containing your changes? That way it would be easier for us to review your changes.

If you've made your changes to an svn checkout of the trunk, you can create the patch by running "svn diff" in the root directory of the checkout.

> org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
> ---------------------------------------------------------
>
>                 Key: PDFBOX-390
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-390
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.8.0-incubator
>            Reporter: Mathias Bosch
>             Fix For: 0.8.0-incubator
>
>
> org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
> According to the Specification (pdf_reference_1-7.pdf) all Whitespace
> Characters between the ASCII-Hex values have to be skipped (see 3.3.1
> ASCIIHexDecode Filter).
> The 0.8.0-incubator source decodes (or attempts to decode) those Whitespace
> Characters and as a result the byte values are wrong (all characters that
> are not [0-9a-f] result in -1, but processing does continue).
> This causes an invalid byte Stream.
> The ASCIIHexDecode Filter Section also defines the EOD end Character of the
> Byte Steam as '>' which might ease the parsing of inline Images.
> (The EI Operator should follow the EOD in case of an inline Image).
> Example for ASCII-Hex encoded value, copied from the Spec:
> FF CE A3 7C 5B 3F 28 16 0A 02 00 02 0A 16 28 3F 5B 7C A3 CE FF >
> I did fix the problem to be able to continue with my work.
> I paste the changed code here as a hint that might help to fix the bug.
> public class ASCIIHexFilter
>   implements Filter
> {
>  /**
>   * Whitespace
>   *   0  0x00  Null (NUL)
>   *   9  0x09  Tab (HT)
>   *  10  0x0A  Line feed (LF)
>   *  12  0x0C  Form feed (FF)
>   *  13  0x0D  Carriage return (CR)
>   *  32  0x20  Space (SP)  
>   */
>   protected boolean isWhitespace(int c) {
>     return c == 0 || c == 9 || c == 10 || c == 12 || c == 13 || c == 32;
>   }
>   
>   protected boolean isEOD(int c) {
>     return (c == 62); // '>' - EOD
>   }
>   /**
>    * {@inheritDoc}
>    */
>   public void decode(InputStream compressedData, OutputStream result, COSDictionary options, int filterIndex) throws IOException {
>     int value = 0;
>     int firstByte = 0;
>     int secondByte = 0;
>     while ((firstByte = compressedData.read()) != -1) {
>       
>       // always after first char
>       while(isWhitespace(firstByte))
>         firstByte = compressedData.read();
>       if(isEOD(firstByte))
>         break;
>       
>       if(REVERSE_HEX[firstByte] == -1)
>         System.out.println("Invalid Hex Code; int: " + firstByte + " char: " + (char) firstByte);
>       value = REVERSE_HEX[firstByte] * 16;
>       secondByte = compressedData.read();
>       
>       if(isEOD(secondByte)) {
>         // second value behaves like 0 in case of EOD
>         result.write(value);
>         break;
>       }
>       if(secondByte >= 0) {
>         if(REVERSE_HEX[secondByte] == -1)
>           System.out.println("Invalid Hex Code; int: " + secondByte + " char: " + (char) secondByte);
>         value += REVERSE_HEX[secondByte];
>       }
>       result.write(value);
>     }
>     
>     result.flush();
>   }
> // .....................................................
> // other code remains unchanged

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.