You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Maruan Sahyoun (JIRA)" <ji...@apache.org> on 2014/03/14 09:13:43 UTC

[jira] [Commented] (PDFBOX-1987) Provide a PDF Lexer as a base for PDF parsing

    [ https://issues.apache.org/jira/browse/PDFBOX-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934744#comment-13934744 ] 

Maruan Sahyoun commented on PDFBOX-1987:
----------------------------------------

I attached a version of a PDF lexer together with a set of tests and some helper classes which extend RandomAccessRead to be able to read test data from strings for easier testing.

The purpose is that people who are interested - and have a better programming background - can inspect and comment on the code. 

An are which I kept out is how to handle malformed tokens such as strings which have an unbalanced number of parenthesis. For a relaxed processing such errors should be fixed. For a strict processing such errors should be reported and potentially fixed as the process shouldn’t stop with the first error.

The current idea I have in mind is that the lexer throws events in such cases which a parser could listen and react upon. Again looking for comments and ideas on this.

> Provide a PDF Lexer as a base for PDF parsing
> ---------------------------------------------
>
>                 Key: PDFBOX-1987
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1987
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>            Reporter: Maruan Sahyoun
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: src.zip
>
>
> In order to enhance the parsing process and as a foundation for a combination of the different parsers a PDF lexer should be provided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)