You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pdfbox.apache.org by bu...@apache.org on 2013/05/12 14:16:52 UTC

svn commit: r861710 - in /websites/staging/pdfbox/trunk/content: ./ userguide/faq.html

Author: buildbot
Date: Sun May 12 12:16:52 2013
New Revision: 861710

Log:
Staging update by buildbot for pdfbox

Modified:
    websites/staging/pdfbox/trunk/content/   (props changed)
    websites/staging/pdfbox/trunk/content/userguide/faq.html

Propchange: websites/staging/pdfbox/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun May 12 12:16:52 2013
@@ -1 +1 @@
-1481535
+1481538

Modified: websites/staging/pdfbox/trunk/content/userguide/faq.html
==============================================================================
--- websites/staging/pdfbox/trunk/content/userguide/faq.html (original)
+++ websites/staging/pdfbox/trunk/content/userguide/faq.html Sun May 12 12:16:52 2013
@@ -143,15 +143,15 @@
                 </ul>
             </div>
             <div class="span9">
-                 <h1 id="faq">FAQ</h1>
-<h2 id="general-questions">General Questions</h2>
+                 <h2 id="frequently-asked-questions">Frequently asked questions</h2>
+<h3 id="general-questions">General Questions</h3>
 <ul>
 <li><a href="#releaseplan">When will the next version of PDFBox be released?</a></li>
 <li><a href="#log4j">I am getting the below Log4J warning message, how do I remove it?</a></li>
 <li><a href="#threadsafe">Is PDFBox thread safe?</a></li>
 <li><a href="#notclosed">Why do I get a "Warning: You did not close the PDF Document"?</a></li>
 </ul>
-<h2 id="text-extraction">Text Extraction</h2>
+<h3 id="text-extraction">Text Extraction</h3>
 <ul>
 <li><a href="#notext">How come I am not getting any text from the PDF document?</a></li>
 <li><a href="#gibberish">How come I am getting gibberish(G38G43G36G51G5) when extracting text?</a></li>
@@ -159,13 +159,13 @@
 <li><a href="#permission">Why do I get "You do not have permission to extract text" on some documents?</a></li>
 <li><a href="#partially">Can't we just extract the text without parsing the whole document or extract text as it is parsed?</a></li>
 </ul>
-<h1 id="answers">Answers</h1>
-<h2 id="general-questions_1">General Questions</h2>
-<h3 id="releaseplan">When will the next version of PDFBox be released</h3>
+<h2 id="answers">Answers</h2>
+<h3 id="general-questions_1">General Questions</h3>
+<h4 id="releaseplan">When will the next version of PDFBox be released</h4>
 <p>As fixes are made and integrated into the repository these changes are documented in the
 <a href="http://pdfbox.apache.org/downloads.html">release notes</a>. An estimate will be given of when the next version will be released.
 Of course, this is only an estimate and could change.</p>
-<h3 id="log4j">I am getting the below Log4J warning message, how do I remove it?</h3>
+<h4 id="log4j">I am getting the below Log4J warning message, how do I remove it?</h4>
 <div class="codehilite"><pre><span class="nl">log4j:</span><span class="n">WARN</span> <span class="n">No</span> <span class="n">appenders</span> <span class="n">could</span> <span class="n">be</span> <span class="n">found</span> <span class="k">for</span> <span class="n">logger</span> <span class="o">(</span><span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">pdfbox</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">ResourceLoader</span><span class="o">).</span>
 <span class="nl">log4j:</span><span class="n">WARN</span> <span class="n">Please</span> <span class="n">initialize</span> <span class="n">the</span> <span class="n">log4j</span> <span class="n">system</span> <span class="n">properly</span><span class="o">.</span>
 </pre></div>
@@ -185,10 +185,10 @@ See the <a href="http://logging.apache.o
 
 <p>Please see <a href="https://sourceforge.net/forum/forum.php?thread_id=1254229&amp;amp;forum_id=267205">this</a> forum thread 
 for more information.</p>
-<h3 id="threadsafe">Is PDFBox thread safe</h3>
+<h4 id="threadsafe">Is PDFBox thread safe</h4>
 <p>No! Only one thread may access a single document at a time. You can have multiple threads
 each accessing their own PDDocument object.</p>
-<h3 id="notclosed">Why do I get a "Warning: You did not close the PDF Document"?</h3>
+<h4 id="notclosed">Why do I get a "Warning: You did not close the PDF Document"?</h4>
 <p>You need to call close() on the PDDocument inside the finally block, if you
 don't then the document will not be closed properly.  Also, you must close all
 PDDocument objects that get created.  The following code creates <strong>two</strong>
@@ -208,8 +208,8 @@ PDDocument objects; one from the "new PD
 </pre></div>
 
 
-<h2 id="text-extraction_1">Text Extraction</h2>
-<h3 id="notext">How come I am not getting any text from the PDF document?</h3>
+<h3 id="text-extraction_1">Text Extraction</h3>
+<h4 id="notext">How come I am not getting any text from the PDF document?</h4>
 <p>Text extraction from a pdf document is a complicated task and there are many factors
 involved that effect the possibility and accuracy of text extraction.  It would be helpful
 to the PDFBox team if you could try a couple things.</p>
@@ -219,22 +219,22 @@ should be able to as well and it is a bu
 <li>It might really be an image instead of text.  Some PDF documents are just images that have been scanned in.
 You can tell by using the selection tool in Acrobat, if you can't select any text then it is probably an image.</li>
 </ul>
-<h3 id="gibberish">How come I am getting gibberish(G38G43G36G51G5) when extracting text?</h3>
+<h4 id="gibberish">How come I am getting gibberish(G38G43G36G51G5) when extracting text?</h4>
 <p>This is because the characters in a PDF document can use a custom encoding
 instead of unicode or ASCII.  When you see gibberish text then it
 probably means that a meaningless internal encoding is being used.  The
 only way to access the text is to use OCR.  This may be a future
 enhancement.</p>
-<h3 id="fontwidth">What does "java.io.IOException: Can't handle font width" mean?</h3>
+<h4 id="fontwidth">What does "java.io.IOException: Can't handle font width" mean?</h4>
 <p>This probably means that the "Resources" directory is not in your classpath. The
 Resources directory is included in the PDFBox jar so this is only a problem if you
 are building PDFBox yourself and not using the binary.</p>
-<h3 id="permission">Why do I get "You do not have permission to extract text" on some documents?</h3>
+<h4 id="permission">Why do I get "You do not have permission to extract text" on some documents?</h4>
 <p>PDF documents have certain security permissions that can be applied to them and two 
 passwords associated with them, a user password and a master password. If the "cannot extract text"
 permission bit is set then you need to decrypt the document with the master password in order
 to extract the text.</p>
-<h2 id="partially">Can't we just extract the text without parsing the whole document or extract text as it is parsed.</h2>
+<h4 id="partially">Can't we just extract the text without parsing the whole document or extract text as it is parsed.</h4>
 <p>Not really, for a couple reasons.</p>
 <ul>
 <li>If the document is encrypted then you need to parse at least until the encryption dictionary before