You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pdfbox.apache.org by bu...@apache.org on 2013/05/07 21:55:16 UTC
svn commit: r861230 - in /websites/staging/pdfbox/trunk/content: ./ ideas.html

Author: buildbot
Date: Tue May  7 19:55:16 2013
New Revision: 861230

Log:
Staging update by buildbot for pdfbox

Modified:
    websites/staging/pdfbox/trunk/content/   (props changed)
    websites/staging/pdfbox/trunk/content/ideas.html

Propchange: websites/staging/pdfbox/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue May  7 19:55:16 2013
@@ -1 +1 @@
-1480050
+1480053

Modified: websites/staging/pdfbox/trunk/content/ideas.html
==============================================================================
--- websites/staging/pdfbox/trunk/content/ideas.html (original)
+++ websites/staging/pdfbox/trunk/content/ideas.html Tue May  7 19:55:16 2013
@@ -143,21 +143,22 @@
                 </ul>
             </div>
             <div class="span9">
-                <p> <p>There are several ideas to enhance PDFBox. These are outlined below together with 
+                <p> <h2 id="ideas">Ideas</h2>
+<p>There are several ideas to enhance PDFBox. These are outlined below together with 
 comments and te releases they are planned for as soon as there is agreement to do the
 implementation.</p>
-<h2 id="enhance-type-safety">Enhance type safety</h2>
+<h3 id="enhance-type-safety">Enhance type safety</h3>
 <p>Enhance the type safety of PDFBox and add more generic collections and code cleanup.</p>
-<h2 id="remove-all-deprecated-methods">Remove all deprecated methods ...</h2>
-<h1 id="handle-large-pdf-files">handle large pdf files</h1>
-<p>in addition to the pdf parsing pdfbox does not always handle large pdf files well as some 
+<h3 id="remove-all-deprecated-methods">Remove all deprecated methods</h3>
+<h4 id="handle-large-pdf-files">handle large pdf files</h4>
+<p>In addition to the pdf parsing pdfbox does not always handle large pdf files well as some 
 of the references are implemented as int instead of long</p>
-<h2 id="switch-to-java-16">Switch to Java 1.6</h2>
-<h2 id="break-pdfbox-into-modules">Break PDFBox into modules</h2>
+<h3 id="switch-to-java-16">Switch to Java 1.6</h3>
+<h3 id="break-pdfbox-into-modules">Break PDFBox into modules</h3>
 <p>In order to support different use cases and provide a minimal toolset PDFBox should be 
 separated into different modules. This goes inline with rearranging some of the code
 e.g. remove awt from PDDocument.</p>
-<h2 id="replaceenhance-pdf-parsing">Replace/enhance PDF parsing</h2>
+<h3 id="replaceenhance-pdf-parsing">Replace/enhance PDF parsing</h3>
 <p>The old "classic" PDF parser in PDFBox is not in line with the PDF specification as it parses
 a PDF from top to bottom instead of respecting the XRef information. The NonSequentialParser
 enhanced that situation but there is a need to have a cleaner foundation broken into several levels</p>
@@ -170,24 +171,26 @@ enhanced that situation but there is a n
 </ul>
 <p>In addition handling documents which are not conforming shouldn't be part of the core parser
 but of a extentable approach e.g. by adding hooks to allow for handling parsing exceptions.</p>
-<h2 id="rearchitect-the-cos-level-objects">Rearchitect the COS level objects</h2>
+<h3 id="rearchitect-the-cos-level-objects">Rearchitect the COS level objects</h3>
 <p>The COS level objects need to be refactored to be in line with the new parser. In addition
 method signatures, constructing ... should be made similar across the COS objects</p>
-<h2 id="parsing-on-demand">Parsing on demand</h2>
+<h3 id="parsing-on-demand">Parsing on demand</h3>
 <p>Instead of always parsing the complete document PDFs should be parsable on demand making
 objects only available as they are needed to enhance performance and minimize memory footprint.</p>
 <p>This might be achieved by providing a layered approach where a base (non caching) parser provides
 the on demand parsing and a caching parser built on top caches objects for use cases where
 this is beneficial e.g. rendering, debugging ...</p>
-<p>o the lexer would be the low level component delivering tokens to the parser.
+<ul>
+<li>the lexer would be the low level component delivering tokens to the parser.
   A sample implementation exists as part of PDFBOX-1000. The benefit would be a clean low
-  level handling of tokens. The current implementation needs to be (slightly ?) revised though
-o the incremental (non caching) parser would allow for page by page processing moving forward 
+  level handling of tokens. The current implementation needs to be (slightly ?) revised though</li>
+<li>the incremental (non caching) parser would allow for page by page processing moving forward 
   only to support text extraction, merging, splitting â¦ - the benefit would be a lower memory 
-  consumption as well as a potential faster processing
-o the caching parser would support applications such a PDFDebugger or PDFReader </p>
-<h1 id="handling-of-pdf-versions">handling of pdf versions</h1>
-<p>the current implementation is a mix of PDF 1.4 and some adhoc additions without a clear 
+  consumption as well as a potential faster processing</li>
+<li>the caching parser would support applications such a PDFDebugger or PDFReader </li>
+</ul>
+<h3 id="handling-of-pdf-versions">Handling of pdf versions</h3>
+<p>The current implementation is a mix of PDF 1.4 and some adhoc additions without a clear 
 distinction what is and is not supported. We could ad some support for explicitly handling
 versions in pdfbox e.g. my marking certain methods and properties to the pdf version support
 level. This could in addition be a good basis for PDF/A and other compliance checks. </p> </p>