You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pdfbox.apache.org by ju...@apache.org on 2008/03/08 19:43:54 UTC

svn commit: r635036 [3/3] - in /incubator/pdfbox/trunk/migration/pdfbox: bugs.xml features.xml

Modified: incubator/pdfbox/trunk/migration/pdfbox/features.xml
URL: http://svn.apache.org/viewvc/incubator/pdfbox/trunk/migration/pdfbox/features.xml?rev=635036&r1=635035&r2=635036&view=diff
==============================================================================
--- incubator/pdfbox/trunk/migration/pdfbox/features.xml (original)
+++ incubator/pdfbox/trunk/migration/pdfbox/features.xml Sat Mar  8 10:43:50 2008
@@ -1,3 +1,4 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
 <tracker version="1.0" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sourceforge.net/export/sf_tracker_export.xsd">
 	<artifact id="1878543">
 		<submitted_by>nobody</submitted_by>
@@ -49,10 +50,10 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Identify text rotation angle in TextPosition</summary>
-		<detail>Applications that use the TextPosition objects generated by PDFStreamEngine sometimes have to be aware of the text rotation angle so that they can handle text that is not horizontal.  For example, we process many PDFs that have vertically-oriented copyright info that appears either next to an image or at the right/left size of the page; this content doesn't extract correctly because it's vertical rather than horizontal.  Applications that attempt to join TextPosition objects have difficulty because they cannot currently distinguish between vertical and horizontal text.
-
-Thanks
-Chris von See
+		<detail>Applications that use the TextPosition objects generated by PDFStreamEngine sometimes have to be aware of the text rotation angle so that they can handle text that is not horizontal.  For example, we process many PDFs that have vertically-oriented copyright info that appears either next to an image or at the right/left size of the page; this content doesn't extract correctly because it's vertical rather than horizontal.  Applications that attempt to join TextPosition objects have difficulty because they cannot currently distinguish between vertical and horizontal text.
+
+Thanks
+Chris von See
 TechAdapt, Inc.</detail>
 	</artifact>
 	<artifact id="1708294">
@@ -66,31 +67,31 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Updated PDFText2HTML</summary>
-		<detail>Hi Ben,
-
-I was wondering, are you accepting member to the project? 
-
-I'm using the PDFBox for importing PDF documents and would need more formatting information that is currently supported by PDFBox. The attached is what I've done so far: handles line breaks, bold, italics. Also added some comment delimiters for page boundaries.
-
-Two things I'll want to handle next are: 
-
-1. Underline
-2. Subscripts and superscripts
-
-Later on, I'll want to also handle the following:
-
-1. Images
-2. Hyperlinks
-3. Tables (I know this might be hard)
-
-I'll need all the help I can get in the form of pointers and clues.
-
-I look forward to reading from you soon.
-
-Many, many thanks for providing us with a great library.
-
-Regards,
-
+		<detail>Hi Ben,
+
+I was wondering, are you accepting member to the project? 
+
+I'm using the PDFBox for importing PDF documents and would need more formatting information that is currently supported by PDFBox. The attached is what I've done so far: handles line breaks, bold, italics. Also added some comment delimiters for page boundaries.
+
+Two things I'll want to handle next are: 
+
+1. Underline
+2. Subscripts and superscripts
+
+Later on, I'll want to also handle the following:
+
+1. Images
+2. Hyperlinks
+3. Tables (I know this might be hard)
+
+I'll need all the help I can get in the form of pointers and clues.
+
+I look forward to reading from you soon.
+
+Many, many thanks for providing us with a great library.
+
+Regards,
+
 Raimi Rufai</detail>
 		<follow_ups>
 			<item>
@@ -200,8 +201,8 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Import a part of a pdf page</summary>
-		<detail>It will be great to have the possibility to import a part of a pdf page defined by a rectangle.
-Everithing contained in this rectangle (text, immage...) can be copied elsewhere (same or other pdf).
+		<detail>It will be great to have the possibility to import a part of a pdf page defined by a rectangle.
+Everithing contained in this rectangle (text, immage...) can be copied elsewhere (same or other pdf).
 I found a such possibility in the commercial component pdfLib</detail>
 	</artifact>
 	<artifact id="1695597">
@@ -215,20 +216,20 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>colorspace as an array entry</summary>
-		<detail>there is a problem when PDFStripper try to extract text from a pdf file.
-
-stacktrace:
-
-java.io.IOException: Unknown colorspace array type:COSName{DeviceRGB} 
-at org.pdfbox.pdmodel.graphics.color.PDColorSpaceFactory.createColorSpace(PDColorSpaceFactory.java:116) 
-at org.pdfbox.pdmodel.PDResources.getColorSpaces(PDResources.java:264) 
-at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:193) 
-at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) 
-at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) 
-at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) 
-at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
-
-
+		<detail>there is a problem when PDFStripper try to extract text from a pdf file.
+
+stacktrace:
+
+java.io.IOException: Unknown colorspace array type:COSName{DeviceRGB} 
+at org.pdfbox.pdmodel.graphics.color.PDColorSpaceFactory.createColorSpace(PDColorSpaceFactory.java:116) 
+at org.pdfbox.pdmodel.PDResources.getColorSpaces(PDResources.java:264) 
+at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:193) 
+at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) 
+at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) 
+at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) 
+at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
+
+
 in attachment the pdf file that generate the error</detail>
 		<existingfiles>
 			<file>
@@ -261,20 +262,20 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Problem during load pdf</summary>
-		<detail>There is a problem on load of some pdf file.
-
-this is a stacktrace of error:
-java.io.IOException: expected='/' actual='M'-77 org.pdfbox.io.PushBackInputStream@ebd7a9 
-at org.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:730) 
-at org.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:205) 
-at org.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:858) 
-at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:448) 
-at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176) 
-at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:703) 
-at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:687) 
-
-
-in attachment the "impeached" pdf
+		<detail>There is a problem on load of some pdf file.
+
+this is a stacktrace of error:
+java.io.IOException: expected='/' actual='M'-77 org.pdfbox.io.PushBackInputStream@ebd7a9 
+at org.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:730) 
+at org.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:205) 
+at org.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:858) 
+at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:448) 
+at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176) 
+at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:703) 
+at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:687) 
+
+
+in attachment the "impeached" pdf
 I use PDFBox 0.7.4</detail>
 		<existingfiles>
 			<file>
@@ -307,8 +308,8 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Need support for PNG images</summary>
-		<detail>I have noticed 1277052, but I can't vote for it or change the priority, so I create a new feature request.
-
+		<detail>I have noticed 1277052, but I can't vote for it or change the priority, so I create a new feature request.
+
 I use PDBBox to extract images from PDF files, it works fine for jpg and tiff, but some customers of ours also need png.</detail>
 		<follow_ups>
 			<item>
@@ -388,16 +389,16 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Pdf Printing with/without Annotations</summary>
-		<detail>Hi all... can i print a PDF File with/without Annotations
-
-I need to give a print to printer by selecting with/ without annotations...
-
-Please give some idea or code..... I want to implement in java...
-
-Thanks in advance...
-
--Rodricks
-
+		<detail>Hi all... can i print a PDF File with/without Annotations
+
+I need to give a print to printer by selecting with/ without annotations...
+
+Please give some idea or code..... I want to implement in java...
+
+Thanks in advance...
+
+-Rodricks
+
 </detail>
 		<change_log>
 			<item>
@@ -431,8 +432,8 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>xml extraction like in adobe professional</summary>
-		<detail>Adobe professional has save pdf as xml plug-in, the extraction can map tables in the pdfs, but this plug-in can olnly be invike since adobe.
-
+		<detail>Adobe professional has save pdf as xml plug-in, the extraction can map tables in the pdfs, but this plug-in can olnly be invike since adobe.
+
 with this type of parse can be parse to enyting you wan.</detail>
 		<existingfiles>
 			<file>
@@ -486,16 +487,16 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Sign .NET DLLs for security</summary>
-		<detail>I`m using your tool at work. It is part of a bigger software solution and it works fine!
-
-Now we have a problem concerning your extracttext.exe – it cannot run from a network drive
-
-because of .net security policy. I wrote a tool that allows application to start from network drives, but
-
-only with one preliminary : it must be strong named.
-
-Could you sign your extracttext.exe and the dll`s used by it ?
-
+		<detail>I`m using your tool at work. It is part of a bigger software solution and it works fine!
+
+Now we have a problem concerning your extracttext.exe – it cannot run from a network drive
+
+because of .net security policy. I wrote a tool that allows application to start from network drives, but
+
+only with one preliminary : it must be strong named.
+
+Could you sign your extracttext.exe and the dll`s used by it ?
+
 You really would save me a lot of work, otherwise I have to replace your tool and so on…</detail>
 		<follow_ups>
 			<item>
@@ -505,26 +506,26 @@
 user_id=1737686
 Originator: NO
 
-I have added a -keyfile line in my build.xml:
-
-&lt;echo&gt;Building PDFBox&lt;/echo&gt; 
-&lt;exec executable="${ikvmc}"&gt; 
-&lt;arg value="-reference:${ikvm.dir}/bin/IKVM.GNU.Classpath.dll" /&gt; 
-&lt;arg value="-reference:${ikvm.dir}/bin/IKVM.AWT.WinForms.dll" /&gt; 
-&lt;arg value="-reference:bin/${fontbox.name}.dll" /&gt; 
-&lt;arg value="-reference:bin/${lucene.name}.dll" /&gt; 
-&lt;arg value="-reference:bin/${lucene-demos.name}.dll" /&gt; 
-&lt;arg value="-reference:bin/${bcprov.name}.dll" /&gt; 
-&lt;arg value="-reference:bin/${bcmail.name}.dll" /&gt; 
-&lt;arg value="-target:library" /&gt; 
-&lt;arg value="-compressresources" /&gt; 
-&lt;arg value="-out:bin\${project.name}.dll" /&gt; 
-&lt;arg value="-keyfile:\PerforceCode\StitchViewer\EmsVwrCtl\Viewer.snk" /&gt; 
-&lt;arg value="lib\${project.name}.jar" /&gt; 
-&lt;/exec&gt; 
-
-Your path &amp; filename will, of course, vary.
-
+I have added a -keyfile line in my build.xml:
+
+&lt;echo&gt;Building PDFBox&lt;/echo&gt; 
+&lt;exec executable="${ikvmc}"&gt; 
+&lt;arg value="-reference:${ikvm.dir}/bin/IKVM.GNU.Classpath.dll" /&gt; 
+&lt;arg value="-reference:${ikvm.dir}/bin/IKVM.AWT.WinForms.dll" /&gt; 
+&lt;arg value="-reference:bin/${fontbox.name}.dll" /&gt; 
+&lt;arg value="-reference:bin/${lucene.name}.dll" /&gt; 
+&lt;arg value="-reference:bin/${lucene-demos.name}.dll" /&gt; 
+&lt;arg value="-reference:bin/${bcprov.name}.dll" /&gt; 
+&lt;arg value="-reference:bin/${bcmail.name}.dll" /&gt; 
+&lt;arg value="-target:library" /&gt; 
+&lt;arg value="-compressresources" /&gt; 
+&lt;arg value="-out:bin\${project.name}.dll" /&gt; 
+&lt;arg value="-keyfile:\PerforceCode\StitchViewer\EmsVwrCtl\Viewer.snk" /&gt; 
+&lt;arg value="lib\${project.name}.jar" /&gt; 
+&lt;/exec&gt; 
+
+Your path &amp; filename will, of course, vary.
+
 The same -keyfile line should work when compiling the executable.</text>
 			</item>
 			<item>
@@ -534,38 +535,38 @@
 user_id=601708
 Originator: YES
 
-Today I found this post; need to try this and verify that this will solve this issue.  If anyone could help verify this that would be great.
-Ben
-
-http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=222499&amp;SiteID=1
-
-
-
-Hey all,
-
-I had a problem, where i strong named my project and then i got the error: Cannot emit assembly as referenced assembly is not strong named.
-
-My referenced assembly is created by IKVM tool compiling Java bytecode into .NET MSIL code. I have got one solution to strong name this dll:
-
-1. Create Messages.il using ildasm Messaged.dll /out:Messages.il
-2. Use the same key pair (used to strong name the project) to create back the Messages.dll: ilasm Messages.il /dll key="D:\sn.key"
-
-This worked for me, is there any better solution out there that i have missed out.
-
-Also, the relative path isn't working for me.
-I am using this: &lt;Assembly: AssemblyKeyFile("D:\sn.key")&gt;  instead of
-
-&lt;Assembly: AssemblyKeyFile("..\\..\\sn.key")&gt; 
-
-Thanks for any help
-Vinay Kant
-
-
-Hi Vinay,
-
-The ildasm / ilasm round trip sounds like your best bet here.  The AssemblyKeyFile path should be the relative path from the Visual Studio output location.
-
--Shawn
+Today I found this post; need to try this and verify that this will solve this issue.  If anyone could help verify this that would be great.
+Ben
+
+http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=222499&amp;SiteID=1
+
+
+
+Hey all,
+
+I had a problem, where i strong named my project and then i got the error: Cannot emit assembly as referenced assembly is not strong named.
+
+My referenced assembly is created by IKVM tool compiling Java bytecode into .NET MSIL code. I have got one solution to strong name this dll:
+
+1. Create Messages.il using ildasm Messaged.dll /out:Messages.il
+2. Use the same key pair (used to strong name the project) to create back the Messages.dll: ilasm Messages.il /dll key="D:\sn.key"
+
+This worked for me, is there any better solution out there that i have missed out.
+
+Also, the relative path isn't working for me.
+I am using this: &lt;Assembly: AssemblyKeyFile("D:\sn.key")&gt;  instead of
+
+&lt;Assembly: AssemblyKeyFile("..\\..\\sn.key")&gt; 
+
+Thanks for any help
+Vinay Kant
+
+
+Hi Vinay,
+
+The ildasm / ilasm round trip sounds like your best bet here.  The AssemblyKeyFile path should be the relative path from the Visual Studio output location.
+
+-Shawn
 </text>
 			</item>
 		</follow_ups>
@@ -581,11 +582,11 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Text Extraction with Formatting</summary>
-		<detail>Is it possible to extract text from a PDF without
-ignoring the formatting?
-
-HTML tags might be used for example. I thought the
-PDFText2Html class would do the trick but it does not.
+		<detail>Is it possible to extract text from a PDF without
+ignoring the formatting?
+
+HTML tags might be used for example. I thought the
+PDFText2Html class would do the trick but it does not.
 Thank you for reading.</detail>
 		<follow_ups>
 			<item>
@@ -604,8 +605,8 @@
 user_id=1776491
 Originator: NO
 
-What email address should I send it to? 
-
+What email address should I send it to? 
+
 </text>
 			</item>
 			<item>
@@ -615,11 +616,11 @@
 user_id=1562185
 Originator: YES
 
-@ rruffai
-
-&gt; You might send a compiled 32-bit windows or linux binary personally to me.
-&gt; (I'm a user of pdftohtml.)
-
+@ rruffai
+
+&gt; You might send a compiled 32-bit windows or linux binary personally to me.
+&gt; (I'm a user of pdftohtml.)
+
 I messed things up. This was also PDFBox. Hehe, sorry.</text>
 			</item>
 			<item>
@@ -629,9 +630,9 @@
 user_id=1562185
 Originator: YES
 
-@ rrufai
-what is the trouble you have with handling underlines?
-
+@ rrufai
+what is the trouble you have with handling underlines?
+
 You might send a compiled 32-bit windows or linux binary personally to me. (I'm a user of pdftohtml.)</text>
 			</item>
 			<item>
@@ -641,19 +642,19 @@
 user_id=1776491
 Originator: NO
 
-Hi Ben,
-&lt;p&gt;
-I've extended PDFText2Html to handle bold, new lines (with &amp;lt;br&amp;gt; tags). However, I'm having trouble figuring out how to handle underlines.
-&lt;/p&gt;
-
-&lt;p&gt;
-Also, I don't know how to post updates. 
-&lt;/p&gt;
-
-Regards,
-
-Raimi
-
+Hi Ben,
+&lt;p&gt;
+I've extended PDFText2Html to handle bold, new lines (with &amp;lt;br&amp;gt; tags). However, I'm having trouble figuring out how to handle underlines.
+&lt;/p&gt;
+
+&lt;p&gt;
+Also, I don't know how to post updates. 
+&lt;/p&gt;
+
+Regards,
+
+Raimi
+
 </text>
 			</item>
 			<item>
@@ -662,29 +663,29 @@
 				<text>Logged In: YES 
 user_id=1562185
 
-Uhmm... well bold, italic, underlined etc... would be a good
-beginning but my ultimate wish would be something like
-quoted below:
-
-&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
-&lt;!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd"&gt;
-
-&lt;pdf2xml&gt;
-&lt;page number="1" position="absolute" top="0" left="0"
-height="1262" width="892"&gt;
-	&lt;fontspec id="0" size="16" family="Times" color="#000000"/&gt;
-	&lt;fontspec id="1" size="16" family="Times" color="#000000"/&gt;
-	&lt;fontspec id="2" size="16" family="Times" color="#000000"/&gt;
-&lt;text top="110" left="106" width="137" height="18"
-font="0"&gt;&lt;i&gt;She &lt;/i&gt;told &lt;b&gt;me&lt;/b&gt;. äµß &lt;/text&gt;
-&lt;/page&gt;
-&lt;/pdf2xml&gt;
-
-I think I have made a mistake by naming it "Text Extraction
-with Formatting"... I should have put my question under a
-more fitting title, something like "PDF to (HTML/)XML
-Conversion with formatting".
-
+Uhmm... well bold, italic, underlined etc... would be a good
+beginning but my ultimate wish would be something like
+quoted below:
+
+&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
+&lt;!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd"&gt;
+
+&lt;pdf2xml&gt;
+&lt;page number="1" position="absolute" top="0" left="0"
+height="1262" width="892"&gt;
+	&lt;fontspec id="0" size="16" family="Times" color="#000000"/&gt;
+	&lt;fontspec id="1" size="16" family="Times" color="#000000"/&gt;
+	&lt;fontspec id="2" size="16" family="Times" color="#000000"/&gt;
+&lt;text top="110" left="106" width="137" height="18"
+font="0"&gt;&lt;i&gt;She &lt;/i&gt;told &lt;b&gt;me&lt;/b&gt;. äµß &lt;/text&gt;
+&lt;/page&gt;
+&lt;/pdf2xml&gt;
+
+I think I have made a mistake by naming it "Text Extraction
+with Formatting"... I should have put my question under a
+more fitting title, something like "PDF to (HTML/)XML
+Conversion with formatting".
+
 Thank you very much for your prompt replies. ^_^</text>
 			</item>
 			<item>
@@ -701,14 +702,14 @@
 				<text>Logged In: YES 
 user_id=1562185
 
-That's exactly what I am looking for. But is this not a
-priority issue for the PDFBox package? It would take me
-quite a time to extend the stripper on my own. One of the
-PDFBox developers might do it better I think.
-
-If you insist that it's a user's issue and PDFBox developers
-would not invest their time in such an extension, could you
-at least tell me whether you have any links to any
+That's exactly what I am looking for. But is this not a
+priority issue for the PDFBox package? It would take me
+quite a time to extend the stripper on my own. One of the
+PDFBox developers might do it better I think.
+
+If you insist that it's a user's issue and PDFBox developers
+would not invest their time in such an extension, could you
+at least tell me whether you have any links to any
 information regarding this matter?</text>
 			</item>
 			<item>
@@ -717,10 +718,10 @@
 				<text>Logged In: YES 
 user_id=601708
 
-HTML tags are not used to format a PDF document.  Font information is available but can be tricky to get what you 
-want.  You will need to extend PDFTextStripper and override writeCharacters to get formatting such as bold/italic.  
-Is that what you are looking for?
-
+HTML tags are not used to format a PDF document.  Font information is available but can be tricky to get what you 
+want.  You will need to extend PDFTextStripper and override writeCharacters to get formatting such as bold/italic.  
+Is that what you are looking for?
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -736,29 +737,29 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Better metadata in conversion to HTML</summary>
-		<detail>It would be great to have better support for metadata 
-in conversion to HTML.
-
-- Being able to create a HTML page with the proper 
-document title in (not one simply guessed from the 
-text of the document).
-
-- Author, keywords, category etc. extracted from the 
-document and placed into metafields in the HTML
-
-- Chosen encoding included in the HTML header.
-
-I am using PDFbox in conjunction with mnoGoSearch to 
-index PDFs on a site. This additional metadata would 
-be extremely handy, since it would form a part of the 
-indexed details for the documents.
-
-Even if a simple tool could be created that would 
-*just* extract the metadata from a document [into 
-some kind of text format], that would be great. 
-External tools could then be built around that, e.g. 
-a templating tool that could create a final format of 
-any form, using the extracted text and the extracted 
+		<detail>It would be great to have better support for metadata 
+in conversion to HTML.
+
+- Being able to create a HTML page with the proper 
+document title in (not one simply guessed from the 
+text of the document).
+
+- Author, keywords, category etc. extracted from the 
+document and placed into metafields in the HTML
+
+- Chosen encoding included in the HTML header.
+
+I am using PDFbox in conjunction with mnoGoSearch to 
+index PDFs on a site. This additional metadata would 
+be extremely handy, since it would form a part of the 
+indexed details for the documents.
+
+Even if a simple tool could be created that would 
+*just* extract the metadata from a document [into 
+some kind of text format], that would be great. 
+External tools could then be built around that, e.g. 
+a templating tool that could create a final format of 
+any form, using the extracted text and the extracted 
 metadata.</detail>
 		<follow_ups>
 			<item>
@@ -766,11 +767,11 @@
 				<sender>nobody</sender>
 				<text>Logged In: NO 
 
-BTW I've not used Java before, so don't have any code to 
-contribute, but if I do come up with anything, I'll post 
-it here.
-
--- Jason
+BTW I've not used Java before, so don't have any code to 
+contribute, but if I do come up with anything, I'll post 
+it here.
+
+-- Jason
 (sorry - mislaid my login too)</text>
 			</item>
 		</follow_ups>
@@ -786,24 +787,24 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Use get/set functions for separators in PDFTextStripper</summary>
-		<detail>Instead of directly using the constants that represent
-the page, line and word separators in PDFTextStripper:
-
-    private String lineSeparator =
-System.getProperty("line.separator");
-    private String pageSeparator =
-System.getProperty("line.separator");
-    private String wordSeparator = " ";
-
-call the getLineSeparator(), getPageSeparator() and
-getWordSeparator() functions so that if these are
-overridden in subclasses that the base PDFTextStripper
-logic will pick up the overridden methods.
-
-Thanks for a great tool!
-
-
-Cheers
+		<detail>Instead of directly using the constants that represent
+the page, line and word separators in PDFTextStripper:
+
+    private String lineSeparator =
+System.getProperty("line.separator");
+    private String pageSeparator =
+System.getProperty("line.separator");
+    private String wordSeparator = " ";
+
+call the getLineSeparator(), getPageSeparator() and
+getWordSeparator() functions so that if these are
+overridden in subclasses that the base PDFTextStripper
+logic will pick up the overridden methods.
+
+Thanks for a great tool!
+
+
+Cheers
 Chris</detail>
 		<follow_ups>
 			<item>
@@ -812,8 +813,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-just added to CVS, check it out.
-
+just added to CVS, check it out.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -843,11 +844,11 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>support inherited attributes at the page level</summary>
-		<detail>The page object has several inherited attributes( 
-MediaBox,CropBox...), the pdmodel needs to expose 
-these attributes and support the the inherited 
-structure.  
-
+		<detail>The page object has several inherited attributes( 
+MediaBox,CropBox...), the pdmodel needs to expose 
+these attributes and support the the inherited 
+structure.  
+
 This request comes from bug 823216.</detail>
 		<follow_ups>
 			<item>
@@ -928,18 +929,18 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>PS TO PDF CONVERSION</summary>
-		<detail>Hi,
-
-There's no postscript to pdf file converter in the 
-open source which is built using Java or .Net. Only 
-one thing that is present is GhostScript which is 
-developed using C++. Can it be possible to use PDFBox 
-to build such a project which converts the PostScript 
-files to PDF format.
-
-Thank You
-
-Regards,
+		<detail>Hi,
+
+There's no postscript to pdf file converter in the 
+open source which is built using Java or .Net. Only 
+one thing that is present is GhostScript which is 
+developed using C++. Can it be possible to use PDFBox 
+to build such a project which converts the PostScript 
+files to PDF format.
+
+Thank You
+
+Regards,
 Govardhana</detail>
 		<follow_ups>
 			<item>
@@ -948,27 +949,27 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Hi Govardhana,
-
-As I mentioned in my earlier comment, this is certainly 
-possible but will be a significant amount of work.
-
-To do this you'll need to be familiar with the internals of 
-both PS and PDF, so review these two documents
-
-Postscript Reference
-http://www.adobe.com/products/postscript/pdfs/PLRM.pdf
-
-PDF Reference
-http://partners.adobe.com/public/developer/pdf/index_referen
-ce.html
-
-Then just get started,
-Ben
-
-"To change the world
-Start with one step.
-However small,
+Hi Govardhana,
+
+As I mentioned in my earlier comment, this is certainly 
+possible but will be a significant amount of work.
+
+To do this you'll need to be familiar with the internals of 
+both PS and PDF, so review these two documents
+
+Postscript Reference
+http://www.adobe.com/products/postscript/pdfs/PLRM.pdf
+
+PDF Reference
+http://partners.adobe.com/public/developer/pdf/index_referen
+ce.html
+
+Then just get started,
+Ben
+
+"To change the world
+Start with one step.
+However small,
 The first step is hardest of all." -DMB</text>
 			</item>
 			<item>
@@ -977,11 +978,11 @@
 				<text>Logged In: YES 
 user_id=1452645
 
-Hi,
-
-I would be very much interested in doing so. But i am a 
-begginer and i would like to know all the information 
-which would be helpful in achieveing this kind of feature. 
+Hi,
+
+I would be very much interested in doing so. But i am a 
+begginer and i would like to know all the information 
+which would be helpful in achieveing this kind of feature. 
 I am waiting for ur response</text>
 			</item>
 			<item>
@@ -990,11 +991,11 @@
 				<text>Logged In: YES 
 user_id=601708
 
-FYI, this is possible but PDFBox is far away from being 
-able to support this.  This is a great feature and will 
-stay as a request but if you want to see this in the near 
-future then I'll need some implementation help.
-
+FYI, this is possible but PDFBox is far away from being 
+able to support this.  This is a great feature and will 
+stay as a request but if you want to see this in the near 
+future then I'll need some implementation help.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -1018,13 +1019,13 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Import/Export of XML Data Package files (XDP)   </summary>
-		<detail>Please, add support for import and export of XDP forms 
-data. Attached is a sample PDF form and the exported 
-XDP file. The PDF file was created using Adobe Designer 
-6.0
-
---
-Tomas
+		<detail>Please, add support for import and export of XDP forms 
+data. Attached is a sample PDF form and the exported 
+XDP file. The PDF file was created using Adobe Designer 
+6.0
+
+--
+Tomas
 </detail>
 		<follow_ups>
 			<item>
@@ -1033,22 +1034,22 @@
 				<text>Logged In: YES 
 user_id=73363
 
-I just started playing with Adobe Designer (60-day trial). 
-What we need (but will probably run up against Adobe patents
-trying to implement) is an XDP processor that does some or
-all of what Adobe Form Server (or LiveCycle Forms) does:
-take the form specified in the XDP file and generate client
-side HTML/Javascript via servlets and JSPs.  The
-HTML/Javascript (I'm guessing, since I don't have $30000
-required to license Form Server) will include all the
-validation provided in the PDF form (through Adobe Reader),
-and the backend servlet will also be responsible for
-connecting to data sources (ODBC, XML w/ schema, etc),
-posting the updated data, providing pre-filled PDF and or
-XML versions of the data to be downloaded, emailed, etc.
-
-I suppose we could start by transforming the XDP using XSLT
-to generate simple text, checkbox, radio, select and submit
+I just started playing with Adobe Designer (60-day trial). 
+What we need (but will probably run up against Adobe patents
+trying to implement) is an XDP processor that does some or
+all of what Adobe Form Server (or LiveCycle Forms) does:
+take the form specified in the XDP file and generate client
+side HTML/Javascript via servlets and JSPs.  The
+HTML/Javascript (I'm guessing, since I don't have $30000
+required to license Form Server) will include all the
+validation provided in the PDF form (through Adobe Reader),
+and the backend servlet will also be responsible for
+connecting to data sources (ODBC, XML w/ schema, etc),
+posting the updated data, providing pre-filled PDF and or
+XML versions of the data to be downloaded, emailed, etc.
+
+I suppose we could start by transforming the XDP using XSLT
+to generate simple text, checkbox, radio, select and submit
 inputs...</text>
 			</item>
 		</follow_ups>
@@ -1089,7 +1090,7 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>extract the text of certain page at certain line</summary>
-		<detail>sometimes,it is unneeded to extraction all the text in a 
+		<detail>sometimes,it is unneeded to extraction all the text in a 
 pdf file :-)</detail>
 	</artifact>
 	<artifact id="1333383">
@@ -1103,24 +1104,24 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary> In memory COSDocument  </summary>
-		<detail>This is an RFE submitted as a result of a post by user
-cannonbeach on 2005-19 called 'In memory COSDocument'.
-
-It involves the creation of an interface with two
-implementations, one for RAF and one for in memory.
-
-Here is Ben's response to the post:
-
-Yes, the correct implementation is to create an
-interface with two implementations, one for RAF and one
-for in memory. 
- 
-This should be a straightforward implementation, can
-one of you create an RFE for it and I will get it in. 
- 
-An interesting third implementation might be one that
-starts with in memory and then switches over to RAF
-after some threshold. I'd be curious if there were any
+		<detail>This is an RFE submitted as a result of a post by user
+cannonbeach on 2005-19 called 'In memory COSDocument'.
+
+It involves the creation of an interface with two
+implementations, one for RAF and one for in memory.
+
+Here is Ben's response to the post:
+
+Yes, the correct implementation is to create an
+interface with two implementations, one for RAF and one
+for in memory. 
+ 
+This should be a straightforward implementation, can
+one of you create an RFE for it and I will get it in. 
+ 
+An interesting third implementation might be one that
+starts with in memory and then switches over to RAF
+after some threshold. I'd be curious if there were any
 performance gains. </detail>
 		<follow_ups>
 			<item>
@@ -1129,14 +1130,14 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Sorry for the delay in doing this, but it is implemented 
-and will be available in tonights build.
-
-See org.pdfbox.pdfparserPDFParser#PDFParser( InputStream, 
-RandomAccess )
-and
-org.pdfbox.io.RandomAccessBuffer
-
+Sorry for the delay in doing this, but it is implemented 
+and will be available in tonights build.
+
+See org.pdfbox.pdfparserPDFParser#PDFParser( InputStream, 
+RandomAccess )
+and
+org.pdfbox.io.RandomAccessBuffer
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -1172,9 +1173,9 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Support PDF Functions</summary>
-		<detail>Add support for PDF functions.  Fix 
-pdmodel.graphics.color.PDSeparation.getTintTransform
-
+		<detail>Add support for PDF functions.  Fix 
+pdmodel.graphics.color.PDSeparation.getTintTransform
+
 Ben</detail>
 		<follow_ups>
 			<item>
@@ -1183,8 +1184,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-implemented in CVS!
-
+implemented in CVS!
+
 Woo hoo</text>
 			</item>
 		</follow_ups>
@@ -1220,12 +1221,12 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Output extracted text to console</summary>
-		<detail>I want to see the output from PDFBox on shell (console) 
-rather than dumping it to a text file. This is similar to 
-what XPDF does when you give - as an
-argument at shell instead of text file name.
-
-Maybe have the same command line arguments as 
+		<detail>I want to see the output from PDFBox on shell (console) 
+rather than dumping it to a text file. This is similar to 
+what XPDF does when you give - as an
+argument at shell instead of text file name.
+
+Maybe have the same command line arguments as 
 xpdf, but I am not sure.</detail>
 		<follow_ups>
 			<item>
@@ -1263,7 +1264,7 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Split PDF's</summary>
-		<detail>Allow for the ability to split up a pdf.  Options might 
+		<detail>Allow for the ability to split up a pdf.  Options might 
 include splitting every x number of pages.</detail>
 		<follow_ups>
 			<item>
@@ -1301,7 +1302,7 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Set label on PDPushButton</summary>
-		<detail>Allow for ability to change the label on a push button.
+		<detail>Allow for ability to change the label on a push button.
 </detail>
 		<change_log>
 			<item>
@@ -1323,19 +1324,19 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Add new form fields to an existing pdf file</summary>
-		<detail>To add some new form fields into an existing pdf file.
-
-There is a shareware named bfop(The Big Faceless PDF
-Library) can do it in an easy way, for example: 
-1. PDF pdf = new PDF(new PDFReader(new
-FileInputStream("example.pdf")));
-2.PDFPage page = pdf.getLastPage();
-3. Form form = pdf.getForm();
-4. FormText address = new FormText(page, 400, 660, 550,
-720);
-5. form.addElement("address", address);
-6. pdf.render(new FileOutputStream("FormCreation.pdf"));
-
+		<detail>To add some new form fields into an existing pdf file.
+
+There is a shareware named bfop(The Big Faceless PDF
+Library) can do it in an easy way, for example: 
+1. PDF pdf = new PDF(new PDFReader(new
+FileInputStream("example.pdf")));
+2.PDFPage page = pdf.getLastPage();
+3. Form form = pdf.getForm();
+4. FormText address = new FormText(page, 400, 660, 550,
+720);
+5. form.addElement("address", address);
+6. pdf.render(new FileOutputStream("FormCreation.pdf"));
+
 </detail>
 		<change_log>
 			<item>
@@ -1363,7 +1364,7 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>support interactive actions</summary>
-		<detail>See PDF Reference 1.5 section 8.5 about actions that 
+		<detail>See PDF Reference 1.5 section 8.5 about actions that 
 exist.  Package has been created but need to implement.</detail>
 	</artifact>
 	<artifact id="1309441">
@@ -1377,16 +1378,16 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Print PDF</summary>
-		<detail>Hi Ben,
-
-I send you two PDF documents. 
-
-First-(V2135-2.pdf) Where some parts of the code are 
-printed badly. 
-
-Second-(1_1.pdf) This pdf, I cant not print it. 
-
-Best regards,
+		<detail>Hi Ben,
+
+I send you two PDF documents. 
+
+First-(V2135-2.pdf) Where some parts of the code are 
+printed badly. 
+
+Second-(1_1.pdf) This pdf, I cant not print it. 
+
+Best regards,
 Nayra.</detail>
 		<existingfiles>
 			<file>
@@ -1425,7 +1426,7 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Encrypt PDF/Decrypt PDF</summary>
-		<detail>Add high level API for setting security settings, provide 
+		<detail>Add high level API for setting security settings, provide 
 command line utility for encrypting/decrypting a PDF</detail>
 		<follow_ups>
 			<item>
@@ -1434,7 +1435,7 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Created command line utilities org.pdfbox.Encrypt and 
+Created command line utilities org.pdfbox.Encrypt and 
 org.pdfbox.Decrypt </text>
 			</item>
 		</follow_ups>
@@ -1464,18 +1465,18 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Provide nighlty builds</summary>
-		<detail>Hi ben,
-
-For people that doesn't rebuild PDFBox from the CVS
-source but that can wait until the next release, it
-will be convenient to have access to nightly builds.
-
-Is it possible to have access to such builds ?
-
-
-Thanks
-
-
+		<detail>Hi ben,
+
+For people that doesn't rebuild PDFBox from the CVS
+source but that can wait until the next release, it
+will be convenient to have access to nightly builds.
+
+Is it possible to have access to such builds ?
+
+
+Thanks
+
+
 Julien</detail>
 		<follow_ups>
 			<item>
@@ -1484,12 +1485,12 @@
 				<text>Logged In: YES 
 user_id=601708
 
-builds are available off of
-
-http://www.csh.rit.edu/~ben/projects/pdfbox/nightly-release/
-
-also link to from the http://www.pdfbox.org main page.
-
+builds are available off of
+
+http://www.csh.rit.edu/~ben/projects/pdfbox/nightly-release/
+
+also link to from the http://www.pdfbox.org main page.
+
 </text>
 			</item>
 			<item>
@@ -1498,9 +1499,9 @@
 				<text>Logged In: YES 
 user_id=601708
 
-This was actually on my todo list, I will do it sooner rather 
-than later now.
-
+This was actually on my todo list, I will do it sooner rather 
+than later now.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -1536,28 +1537,28 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Add ability to digitally sign a PDF</summary>
-		<detail>
-Implementation Notes:
-What about the info on this site:
-http://www.codeproject.com/useritems/PdfDigiPad.asp
-
-Adobe PDF Public-Key Digital Signature and Encryption 
-Specification
-http://partners.adobe.com/asn/developer/pdfs/tn/ppk_pd
-fspec.pdf
-
-Adobe Acrobat Digital Signature API Reference
-http://partners.adobe.com/asn/acrobat/docs/digsig.pdf
-
-http://www.mail-archive.com/itext-
-questions@lists.sourceforge.net/msg11084.html
-
-http://groups.google.de/groups?
-q=sign+openssl+group:comp.text.pdf&amp;hl=de&amp;lr=lang_de|l
-ang_en&amp;ie=UTF-
-8&amp;group=comp.text.pdf&amp;selm=f55510dc.0403111256.f0a6
-513%40posting.google.com&amp;rnum=1
-
+		<detail>
+Implementation Notes:
+What about the info on this site:
+http://www.codeproject.com/useritems/PdfDigiPad.asp
+
+Adobe PDF Public-Key Digital Signature and Encryption 
+Specification
+http://partners.adobe.com/asn/developer/pdfs/tn/ppk_pd
+fspec.pdf
+
+Adobe Acrobat Digital Signature API Reference
+http://partners.adobe.com/asn/acrobat/docs/digsig.pdf
+
+http://www.mail-archive.com/itext-
+questions@lists.sourceforge.net/msg11084.html
+
+http://groups.google.de/groups?
+q=sign+openssl+group:comp.text.pdf&amp;hl=de&amp;lr=lang_de|l
+ang_en&amp;ie=UTF-
+8&amp;group=comp.text.pdf&amp;selm=f55510dc.0403111256.f0a6
+513%40posting.google.com&amp;rnum=1
+
 </detail>
 		<follow_ups>
 			<item>
@@ -1574,9 +1575,9 @@
 				<text>Logged In: YES 
 user_id=601708
 
-The org.pdfbox.pdmodel.interactive.digitalsignature package 
-has been created but needs to be implemented.
-
+The org.pdfbox.pdmodel.interactive.digitalsignature package 
+has been created but needs to be implemented.
+
 Ben</text>
 			</item>
 			<item>
@@ -1585,7 +1586,7 @@
 				<text>Logged In: YES 
 user_id=601708
 
-java example of signing a pdf using an x509 certificate from a 
+java example of signing a pdf using an x509 certificate from a 
 file or a X506Certificate instance </text>
 			</item>
 		</follow_ups>
@@ -1609,9 +1610,9 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Add ability to extract comments</summary>
-		<detail>Create command line app to extract comments from a 
-document.
-
+		<detail>Create command line app to extract comments from a 
+document.
+
 Ben</detail>
 		<change_log>
 			<item>
@@ -1633,20 +1634,20 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Replace existing font with new font</summary>
-		<detail>Create a class, maybe an example that allows a user to 
-replace a font with a font in the filesystem or an existing 
-font in the PDF.  This is useful when a collection of PDF 
-documents have been appended and contain many 
-embedded fonts like 
-'BDED50A3 Times New Roman'
-'GH34DFH Times New Roman'
-
-In order to reduce file size it would be nice to be able to 
-reduce these to one embedded font.
-
-Where the font is embedded multiple times but is really 
-the same font over and over.
-
+		<detail>Create a class, maybe an example that allows a user to 
+replace a font with a font in the filesystem or an existing 
+font in the PDF.  This is useful when a collection of PDF 
+documents have been appended and contain many 
+embedded fonts like 
+'BDED50A3 Times New Roman'
+'GH34DFH Times New Roman'
+
+In order to reduce file size it would be nice to be able to 
+reduce these to one embedded font.
+
+Where the font is embedded multiple times but is really 
+the same font over and over.
+
 Ben</detail>
 		<change_log>
 			<item>
@@ -1668,7 +1669,7 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>FDF export</summary>
-		<detail>Add ability to create a .fdf file from a PDF file.  Maybe 
+		<detail>Add ability to create a .fdf file from a PDF file.  Maybe 
 give the ability to do batch exporting.</detail>
 		<follow_ups>
 			<item>
@@ -1677,8 +1678,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Single export now available in CVS.
-
+Single export now available in CVS.
+
 Ben</text>
 			</item>
 			<item>
@@ -1687,10 +1688,10 @@
 				<text>Logged In: YES 
 user_id=601708
 
-The export really should be 
-ExportFormsToFDF
-ExportFormsToXFDF
-ExportCommentsToFDF
+The export really should be 
+ExportFormsToFDF
+ExportFormsToXFDF
+ExportCommentsToFDF
 ExportCommentsToXFDF</text>
 			</item>
 		</follow_ups>
@@ -1738,135 +1739,135 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>text from box</summary>
-		<detail>I am attaching a file with this message. 
-
-problem:
-
-The text from the rectangles are not read sequentially 
-i.e not extract from a single rectangle at a time. it is 
-extracting randomly from different rectangles. I want to 
-get the text rectangle wise.
-
-
-for example - PDF page no - 89
-
-The text is to be extracted in this way
-
-A
-767
-FAULT ISOLATION/MAINT MANUAL
-
-PASSENGER ADDRESS
-AMPLIFIER BITE
-PROCEDURE
-
-PREREQUISITES
-MAKE SURE THIS CIRCUIT BREAKER IS CLOSED:
-11C22
-MAKE SURE THE AIRPLANE IS IN THIS CONFIGURATION:
-  ELECTRICAL POWER IS ON (AMM 24-22-00/201)
-
-1   SET THE FUNCTION SELECTOR NO
-SWITCH TO THE "LEVEL" POSITION
-ON THE PA AMPLIFIER FRONT
-PANEL AT E2-5.
-    DOES THE PA AMPLIFIER
-FRONT PANEL SHOW 69 TO 71
-VRMS?
-
-10  ADJUST THE "MASTER GAIN"
-FOR 69 TO 71 VRMS.
-    DOES THE PA AMPLIFIER
-FRONT PANEL SHOW 69 TO 71
-VRMS?
-YES
-NO
- 20  REPLACE THE PA AMPLIFIER,
-M177 (AMM 23-31-01/401).
-YES
-2   SET THE FUNCTION SELECTOR
-SWITCH TO THE "LOAD" POSITION.
-    DOES THE PA AMPLIFIER NO
-FRONT PANEL SHOW 30 OHMS OR
-MORE?
-YES
-21  EXAMINE THE SPEAKER WIRING
-FOR SHORT CIRCUITS FROM
-PIN A13 TO B13 (IF USED) AND
-PIN A15 TO B15, OF CONNECTOR
-D455B, AT E2-5 (WDM 23-31-14
-THRU 23-31-17).
-    REPAIR THE PROBLEMS THAT
-YOU FIND.
-
-3   SET THE FUNCTION SELECTOR NO
-SWITCH TO THE "TONE" POSITION.
-    DO YOU HEAR SOUND FROM ALL
-THE PA SPEAKERS?
-YES
-4   SET THE FUNCTION SELECTOR
-SWITCH TO THE "OPERATE" POSI-
-TION.
-    THE SYSTEM IS OK.
-11  DO YOU HEAR NO SOUND AT
-ONE OF THE SPEAKERS?
-NO
-12  DO YOU HEAR NO SOUND FROM
-ALL OF THE SPEAKERS?
-NO
-YES
-YES
-22  REPLACE THE BAD SPEAKER.
-REFER TO TABLE 101.
-23  REPLACE THE PA AMPLIFIER,
-M177 (AMM 23-31-01/401).
-24  EXAMINE THE SPEAKER WIRING
-FOR OPEN CIRCUITS FROM A
-SPEAKER WITH THE SOUND TO A
-SPEAKER WITHOUT (WDM 23-31-14
-THRU 23-31-17).
-    REPAIR THE PROBLEMS THAT
-YOU FIND.
-
-NOTE:
-BITE DOES A TEST OF THESE SYSTEM COMPONENTS:
-  PA AMPLIFIER 
-  SPEAKERS
-  SPEAKER WIRING. 
-BITE DOES NOT DO A TEST OF THESE SYSTEM 
-COMPONENTS:
-  AUDIO ACCESSORY UNIT 
-  ZONE MULTIPLEXER. 
-SPEAKER
-LOCATION
-PSU
-GALLEY
-LAVATORY
-CEILING
-AMM
-REFERENCE
-23-31-02/401
-23-31-04/401
-23-31-05/401
-23-31-08/401
-TABLE 101
-Passenger Address Amplifier BITE Procedure
-Figure 103
-       
-
-its just the outline of how i need the information. The 
-text should  be read completely from one rectanble and 
-then switched to next rectangle etc.,
-
-Similar pages are also in this PDF document. pls test it 
-with that also.
-
-it will be useful for me if u give the details of the images 
-in this file(how it is stored and which format)
-
-pls give importance to this message.
-Thanks in advance. Waiting for ur reply.
-
+		<detail>I am attaching a file with this message. 
+
+problem:
+
+The text from the rectangles are not read sequentially 
+i.e not extract from a single rectangle at a time. it is 
+extracting randomly from different rectangles. I want to 
+get the text rectangle wise.
+
+
+for example - PDF page no - 89
+
+The text is to be extracted in this way
+
+A
+767
+FAULT ISOLATION/MAINT MANUAL
+
+PASSENGER ADDRESS
+AMPLIFIER BITE
+PROCEDURE
+
+PREREQUISITES
+MAKE SURE THIS CIRCUIT BREAKER IS CLOSED:
+11C22
+MAKE SURE THE AIRPLANE IS IN THIS CONFIGURATION:
+  ELECTRICAL POWER IS ON (AMM 24-22-00/201)
+
+1   SET THE FUNCTION SELECTOR NO
+SWITCH TO THE "LEVEL" POSITION
+ON THE PA AMPLIFIER FRONT
+PANEL AT E2-5.
+    DOES THE PA AMPLIFIER
+FRONT PANEL SHOW 69 TO 71
+VRMS?
+
+10  ADJUST THE "MASTER GAIN"
+FOR 69 TO 71 VRMS.
+    DOES THE PA AMPLIFIER
+FRONT PANEL SHOW 69 TO 71
+VRMS?
+YES
+NO
+ 20  REPLACE THE PA AMPLIFIER,
+M177 (AMM 23-31-01/401).
+YES
+2   SET THE FUNCTION SELECTOR
+SWITCH TO THE "LOAD" POSITION.
+    DOES THE PA AMPLIFIER NO
+FRONT PANEL SHOW 30 OHMS OR
+MORE?
+YES
+21  EXAMINE THE SPEAKER WIRING
+FOR SHORT CIRCUITS FROM
+PIN A13 TO B13 (IF USED) AND
+PIN A15 TO B15, OF CONNECTOR
+D455B, AT E2-5 (WDM 23-31-14
+THRU 23-31-17).
+    REPAIR THE PROBLEMS THAT
+YOU FIND.
+
+3   SET THE FUNCTION SELECTOR NO
+SWITCH TO THE "TONE" POSITION.
+    DO YOU HEAR SOUND FROM ALL
+THE PA SPEAKERS?
+YES
+4   SET THE FUNCTION SELECTOR
+SWITCH TO THE "OPERATE" POSI-
+TION.
+    THE SYSTEM IS OK.
+11  DO YOU HEAR NO SOUND AT
+ONE OF THE SPEAKERS?
+NO
+12  DO YOU HEAR NO SOUND FROM
+ALL OF THE SPEAKERS?
+NO
+YES
+YES
+22  REPLACE THE BAD SPEAKER.
+REFER TO TABLE 101.
+23  REPLACE THE PA AMPLIFIER,
+M177 (AMM 23-31-01/401).
+24  EXAMINE THE SPEAKER WIRING
+FOR OPEN CIRCUITS FROM A
+SPEAKER WITH THE SOUND TO A
+SPEAKER WITHOUT (WDM 23-31-14
+THRU 23-31-17).
+    REPAIR THE PROBLEMS THAT
+YOU FIND.
+
+NOTE:
+BITE DOES A TEST OF THESE SYSTEM COMPONENTS:
+  PA AMPLIFIER 
+  SPEAKERS
+  SPEAKER WIRING. 
+BITE DOES NOT DO A TEST OF THESE SYSTEM 
+COMPONENTS:
+  AUDIO ACCESSORY UNIT 
+  ZONE MULTIPLEXER. 
+SPEAKER
+LOCATION
+PSU
+GALLEY
+LAVATORY
+CEILING
+AMM
+REFERENCE
+23-31-02/401
+23-31-04/401
+23-31-05/401
+23-31-08/401
+TABLE 101
+Passenger Address Amplifier BITE Procedure
+Figure 103
+       
+
+its just the outline of how i need the information. The 
+text should  be read completely from one rectanble and 
+then switched to next rectangle etc.,
+
+Similar pages are also in this PDF document. pls test it 
+with that also.
+
+it will be useful for me if u give the details of the images 
+in this file(how it is stored and which format)
+
+pls give importance to this message.
+Thanks in advance. Waiting for ur reply.
+
 </detail>
 		<existingfiles>
 			<file>
@@ -1905,17 +1906,17 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>CID to Unicode mapping</summary>
-		<detail>For extracting CJK text it would be usefull to map CID-
-keyed cheracters to Unicode. 
-
-For example, "90ms-RKSJ-UCS2" cmap file can be use 
-for retrieving unicodes for "90ms-RKSJ-H" and "90ms-
-RKSJ-V" encoding of CID-fonts.
-
-Now CMapParser parse "bfrange" and "bfchar". If is 
-enough for parsing ToUnicode CMap files. 
-
-So, as I understand, "encoding name to ToUnicode 
+		<detail>For extracting CJK text it would be usefull to map CID-
+keyed cheracters to Unicode. 
+
+For example, "90ms-RKSJ-UCS2" cmap file can be use 
+for retrieving unicodes for "90ms-RKSJ-H" and "90ms-
+RKSJ-V" encoding of CID-fonts.
+
+Now CMapParser parse "bfrange" and "bfchar". If is 
+enough for parsing ToUnicode CMap files. 
+
+So, as I understand, "encoding name to ToUnicode 
 CMap file name" mapping is needed only.</detail>
 		<follow_ups>
 			<item>
@@ -1924,14 +1925,14 @@
 				<text>Logged In: YES 
 user_id=958555
 
-Some additional information...
-
-There is better way to retrieve unicode symbol:
-
-We should get CID, using natural CMap file and map it to 
-Unicode, using appropriate Uni* CMap file (backward 
-mapping)...
-
+Some additional information...
+
+There is better way to retrieve unicode symbol:
+
+We should get CID, using natural CMap file and map it to 
+Unicode, using appropriate Uni* CMap file (backward 
+mapping)...
+
 So, "cidrange" and "cidchar" should be parsered too...</text>
 			</item>
 		</follow_ups>
@@ -1955,7 +1956,7 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>HTML -&amp;gt; PDF</summary>
-		<detail>It would be really nice to take a html and create a PDF 
+		<detail>It would be really nice to take a html and create a PDF 
 from it.</detail>
 		<change_log>
 			<item>
@@ -1977,13 +1978,13 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Add ability to set document information using PDFViewer</summary>
-		<detail>Enhance PDFViewer to show a dialog with the document 
-information, allow the user to edit this information and 
-save it back to the file system.  
-
-Also give the ability to change a set of PDF documents, 
-such as a subdirectory or something.
-
+		<detail>Enhance PDFViewer to show a dialog with the document 
+information, allow the user to edit this information and 
+save it back to the file system.  
+
+Also give the ability to change a set of PDF documents, 
+such as a subdirectory or something.
+
 </detail>
 		<change_log>
 			<item>
@@ -2011,9 +2012,9 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>alternate header form</summary>
-		<detail>Support PDF Reference 1.5 Impl Note 14 
-
-14. Acrobat viewers will also accept a header of the form
+		<detail>Support PDF Reference 1.5 Impl Note 14 
+
+14. Acrobat viewers will also accept a header of the form
 %!PS&amp;amp;#8722;Adobe&amp;amp;#8722;N.n PDF&amp;amp;#8722;M.m</detail>
 		<change_log>
 			<item>
@@ -2037,7 +2038,7 @@
 		<summary>Notification about output progress</summary>
 		<detail>Does your programming interface provide hooks or callback 
 services to notify an application or an other service about the PDF 
-file creation progress?
+file creation progress?
 I would like that a function or an object 
 gets automatically informed if a new page will be started or 
 finished. An user interface can display this change in a progress 
@@ -2049,10 +2050,10 @@
 				<text>Logged In: YES 
 user_id=601708
 
-output progress of text extraction can be achieved by 
-extending PDFTextStripper, PDF creation progress depends on 
-how PDFBox is being used, so the client should track that.
-
+output progress of text extraction can be achieved by 
+extending PDFTextStripper, PDF creation progress depends on 
+how PDFBox is being used, so the client should track that.
+
 Ben</text>
 			</item>
 			<item>
@@ -2061,7 +2062,7 @@
 				<text>Logged In: YES 
 user_id=572001
 
-Does your output process offer page events during the PDF 
+Does your output process offer page events during the PDF 
 creation?</text>
 			</item>
 			<item>
@@ -2113,8 +2114,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-ColorSpaces are not supported in the CVS version of PDFBox.
-
+ColorSpaces are not supported in the CVS version of PDFBox.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -2150,7 +2151,7 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>extract information from tagged PDF</summary>
-		<detail>Add the ability to extract information from a tagged PDF 
+		<detail>Add the ability to extract information from a tagged PDF 
 document.  See taggedPDF.pdf for an example.</detail>
 		<follow_ups>
 			<item>
@@ -2159,18 +2160,18 @@
 				<text>Logged In: YES 
 user_id=1468838
 
-Hi,
-we have to parse the PDF object structure tree; all
-structural elements are inside the object tree (see e.g.
-PDFReference 1.4 chapter 9.6 "Logical Structure").
-- parse the PDF page streams to extract drawing and text
-operations;these contain the actual content of the
-structural elements. This content is surrounded by BMC/EMC
-tags which contain information to which element object the
-contained content belongs.This is what i got from pdf reference.
-
-Regards,
-Qumar.
+Hi,
+we have to parse the PDF object structure tree; all
+structural elements are inside the object tree (see e.g.
+PDFReference 1.4 chapter 9.6 "Logical Structure").
+- parse the PDF page streams to extract drawing and text
+operations;these contain the actual content of the
+structural elements. This content is surrounded by BMC/EMC
+tags which contain information to which element object the
+contained content belongs.This is what i got from pdf reference.
+
+Regards,
+Qumar.
 </text>
 			</item>
 			<item>
@@ -2179,17 +2180,17 @@
 				<text>Logged In: YES 
 user_id=601708
 
-http://www.irs.gov/pub/irs-access/f1040ez_accessible.pdf
-would be a good form to start with.
-
-If you notice they are putting labels on the form fields.  
-these labels contain meta data critical to building tax 
-software in rapid fashion.  Without this meta data, the 
-name of the form field is meaningless. It would be nice to 
-extract this information so I can combine it with other 
-data about the field (name, type, location, etc).  I 
-already know PDFBox can extract the other information about 
-the fields.  I haven't done it with PDFBox, but I did it 
+http://www.irs.gov/pub/irs-access/f1040ez_accessible.pdf
+would be a good form to start with.
+
+If you notice they are putting labels on the form fields.  
+these labels contain meta data critical to building tax 
+software in rapid fashion.  Without this meta data, the 
+name of the form field is meaningless. It would be nice to 
+extract this information so I can combine it with other 
+data about the field (name, type, location, etc).  I 
+already know PDFBox can extract the other information about 
+the fields.  I haven't done it with PDFBox, but I did it 
 with iText.</text>
 			</item>
 			<item>
@@ -2198,19 +2199,19 @@
 				<text>Logged In: YES 
 user_id=601708
 
-More comments from users
-
-Tagged PDF will be a big thing in government because 
-federal government procurement of Acrobat publishing 
-technology falls under Section 508.  States will likely 
-follow.
-
- see:
-www.section508.gov
-
-http://www.irs.gov/pub/irs-access/
-or
-ftp://ftp.irs.gov/pub/irs-access/
+More comments from users
+
+Tagged PDF will be a big thing in government because 
+federal government procurement of Acrobat publishing 
+technology falls under Section 508.  States will likely 
+follow.
+
+ see:
+www.section508.gov
+
+http://www.irs.gov/pub/irs-access/
+or
+ftp://ftp.irs.gov/pub/irs-access/
 </text>
 			</item>
 			<item>
@@ -2219,12 +2220,12 @@
 				<text>Logged In: YES 
 user_id=1468838
 
-Hi,
-
- i was seeing the specification of pdf and came to know the
-structure information of pdf will be in PDSEdit
-layer,PDSEdit Layer gives access to structure tree with in a
-pdf and methods methods and objects are prefixed by PDS.So
+Hi,
+
+ i was seeing the specification of pdf and came to know the
+structure information of pdf will be in PDSEdit
+layer,PDSEdit Layer gives access to structure tree with in a
+pdf and methods methods and objects are prefixed by PDS.So
 how can we get access to PDSEdit layer of pdf.</text>
 			</item>
 			<item>
@@ -2233,61 +2234,61 @@
 				<text>Logged In: YES 
 user_id=1468838
 
-It would be nice if pdfbox can provide the ability to
-extract information from tagged PDF.As Adobre Acrobat Reader
-provides the tags for the pdf, pdfbox should also try to get
-the tagged pdfs.
-
-for example if iwe have a pdf file with a para1 under
-header1 and para2 under header 2 and a table with rows and
-columns.something like 
- 
-Header1 
-This is a para 1 ,it describes about a disease.  
-Header2 
-This is a para2,describes remedies of disease. 
-Table 
-A B  
-C D 
- 
- 
-Now the tagged pdf looks like below in adobe acrobat reader
- 
-&lt;Heading 1&gt; 
-Header1 
-&lt;Normal&gt;  
-This is a para 1 ,it describes about a disease. 
-&lt;Heading 1&gt; 
-Header1 
-&lt;Normal&gt;  
-This is a para2,describes remedies of disease. 
-&lt;Heading 1&gt; 
-Table 
-&lt;Table&gt; 
-&lt;TBody&gt; 
-&lt;TR&gt; 
-&lt;TD&gt; 
-&lt;Normal&gt; 
-A 
-&lt;TD&gt; 
-&lt;Normal&gt; 
-B 
-&lt;TR&gt; 
-&lt;TD&gt; 
-&lt;Normal&gt; 
-C 
-&lt;TD&gt; 
-&lt;Normal&gt; 
-D 
-
-how can we extract the Heading1 ,Heading 2 and tabular data
-using pdfbox.
-
-This is a good feature which should be added to the armory
-pdfbox.
-
-Please provide this feature.
-
+It would be nice if pdfbox can provide the ability to
+extract information from tagged PDF.As Adobre Acrobat Reader
+provides the tags for the pdf, pdfbox should also try to get
+the tagged pdfs.
+
+for example if iwe have a pdf file with a para1 under
+header1 and para2 under header 2 and a table with rows and
+columns.something like 
+ 
+Header1 
+This is a para 1 ,it describes about a disease.  
+Header2 
+This is a para2,describes remedies of disease. 
+Table 
+A B  
+C D 
+ 
+ 
+Now the tagged pdf looks like below in adobe acrobat reader
+ 
+&lt;Heading 1&gt; 
+Header1 
+&lt;Normal&gt;  
+This is a para 1 ,it describes about a disease. 
+&lt;Heading 1&gt; 
+Header1 
+&lt;Normal&gt;  
+This is a para2,describes remedies of disease. 
+&lt;Heading 1&gt; 
+Table 
+&lt;Table&gt; 
+&lt;TBody&gt; 
+&lt;TR&gt; 
+&lt;TD&gt; 
+&lt;Normal&gt; 
+A 
+&lt;TD&gt; 
+&lt;Normal&gt; 
+B 
+&lt;TR&gt; 
+&lt;TD&gt; 
+&lt;Normal&gt; 
+C 
+&lt;TD&gt; 
+&lt;Normal&gt; 
+D 
+
+how can we extract the Heading1 ,Heading 2 and tabular data
+using pdfbox.
+
+This is a good feature which should be added to the armory
+pdfbox.
+
+Please provide this feature.
+
  </text>
 			</item>
 		</follow_ups>
@@ -2320,14 +2321,14 @@
 				<text>Logged In: YES 
 user_id=601708
 
-This is a nice feature, but way out of the bounds of PDFBox.  
-I would gladly take any contributions that you have.  If there 
-is a need for this, then maybe a customer could fund the 
-development.
-
-Closing this issue for now, contribute a patch if you would like 
-it to be part of PDFBox.
-
+This is a nice feature, but way out of the bounds of PDFBox.  
+I would gladly take any contributions that you have.  If there 
+is a need for this, then maybe a customer could fund the 
+development.
+
+Closing this issue for now, contribute a patch if you would like 
+it to be part of PDFBox.
+
 Ben</text>
 			</item>
 			<item>
@@ -2336,12 +2337,12 @@
 				<text>Logged In: YES 
 user_id=572001
 
-What is the result if you check your PDF files with the 
-following tools?
-- Free PDF/X Test Patch
-  http://pdfx3.org/pdfxtestpatch.html
-
-- PDF/X-3 Inspector
+What is the result if you check your PDF files with the 
+following tools?
+- Free PDF/X Test Patch
+  http://pdfx3.org/pdfxtestpatch.html
+
+- PDF/X-3 Inspector
   http://pdfx3.org/download.html</text>
 			</item>
 			<item>
@@ -2350,8 +2351,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Is this offer?  I have quite a bit of work with just the PDF 
-specification, not sure if my time would be best spent 
+Is this offer?  I have quite a bit of work with just the PDF 
+specification, not sure if my time would be best spent 
 supporting PDF/X specification as well.</text>
 			</item>
 		</follow_ups>
@@ -2395,9 +2396,9 @@
 				<text>Logged In: YES 
 user_id=109405
 
-I'm interested too in such a feature. 
-Ben, do you think is easy to modify PDFTextStripper in order 
-to aquire that ?
+I'm interested too in such a feature. 
+Ben, do you think is easy to modify PDFTextStripper in order 
+to aquire that ?
 </text>
 			</item>
 			<item>
@@ -2405,26 +2406,26 @@
 				<sender>nobody</sender>
 				<text>Logged In: NO 
 
-I have a pdf containing text and images on the same page.
-
-I need a posibility to know the text and the place of images. 
-For example i need a output like that:
-
-Text text text
-&lt;image /&gt;
-
-text another text
-&lt;image /&gt;
-...
-text text 
-
-
-Is there any posibility to do that using pdfbox ?
-
-(I'm using TextStripper right now, but this in not giving me 
-any image info).
-
-Thanx in advance
+I have a pdf containing text and images on the same page.
+
+I need a posibility to know the text and the place of images. 
+For example i need a output like that:
+
+Text text text
+&lt;image /&gt;
+
+text another text
+&lt;image /&gt;
+...
+text text 
+
+
+Is there any posibility to do that using pdfbox ?
+
+(I'm using TextStripper right now, but this in not giving me 
+any image info).
+
+Thanx in advance
 </text>
 			</item>
 			<item>
@@ -2433,34 +2434,34 @@
 				<text>Logged In: YES 
 user_id=370177
 
-Ben, I did some testing and I'm getting the following problems:
-1. PDDeviceGray.createColorModel is not implemented. I need
-this to read PDFs produced by our scanner (black and white
-only).
-Exception in thread "main" java.io.IOException: Not implemented
-        at
-org.pdfbox.pdmodel.graphics.color.PDDeviceGray.createColorModel(PDDeviceGray.java:101)
-
-2. CCITTFaxDecode.decode is not implemented.
-
-3. DCTFilter.decode is not implemented
-
-4. Running ExtractImages on one of my PDFs, I got the
-following  error:
-Exception in thread "main" java.lang.ClassCastException:
-org.pdfbox.cos.COSArray
-        at
-org.pdfbox.pdmodel.graphics.image.PDXObjectImageFactory.getImage(PDXObjectImageFactory.ja
-va:38)
-        at
-org.pdfbox.pdmodel.PDResources.getImages(PDResources.java:155)
-        at
-org.pdfbox.ExtractImages.extractImages(ExtractImages.java:159)
-
-5. I had partial success on one PDF, and produced an image!
-However, the image was upside down, and I used Paint to flip
-it vertically to correct this.
-
+Ben, I did some testing and I'm getting the following problems:
+1. PDDeviceGray.createColorModel is not implemented. I need
+this to read PDFs produced by our scanner (black and white
+only).
+Exception in thread "main" java.io.IOException: Not implemented
+        at
+org.pdfbox.pdmodel.graphics.color.PDDeviceGray.createColorModel(PDDeviceGray.java:101)
+
+2. CCITTFaxDecode.decode is not implemented.
+
+3. DCTFilter.decode is not implemented
+
+4. Running ExtractImages on one of my PDFs, I got the
+following  error:
+Exception in thread "main" java.lang.ClassCastException:
+org.pdfbox.cos.COSArray
+        at
+org.pdfbox.pdmodel.graphics.image.PDXObjectImageFactory.getImage(PDXObjectImageFactory.ja
+va:38)
+        at
+org.pdfbox.pdmodel.PDResources.getImages(PDResources.java:155)
+        at
+org.pdfbox.ExtractImages.extractImages(ExtractImages.java:159)
+
+5. I had partial success on one PDF, and produced an image!
+However, the image was upside down, and I used Paint to flip
+it vertically to correct this.
+
 Finally, I will send you the 3 test PDFs I tried this on.</text>
 			</item>
 			<item>
@@ -2469,7 +2470,7 @@
 				<text>Logged In: YES 
 user_id=601708
 
-pdfbox now contains a command line app called 
+pdfbox now contains a command line app called 
 org.pdfbox.ExtractImages.</text>
 			</item>
 		</follow_ups>
@@ -2517,12 +2518,12 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>PDF to HTML conversion </summary>
-		<detail>PDF to HTML conversion. 
-Conserve formating.
-
-check out www.sourceforge.net/projects/pdftohtml 
-for a hack of this process.
-
+		<detail>PDF to HTML conversion. 
+Conserve formating.
+
+check out www.sourceforge.net/projects/pdftohtml 
+for a hack of this process.
+
 </detail>
 		<follow_ups>
 			<item>
@@ -2531,7 +2532,7 @@
 				<text>Logged In: YES 
 user_id=747013
 
-Also conversion to xml or word etc would be amazing.
+Also conversion to xml or word etc would be amazing.
 </text>
 			</item>
 		</follow_ups>
@@ -2555,17 +2556,17 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>CJK decoding</summary>
-		<detail>Another feature I need a lot is the correct interpretation 
-of CJK encoding.
-
-Yes, I know PDF can be a pain when it comes to 
-correctly interpreting CJK charsets, as many factors are 
-involved, including whether a font (or its subset) is 
-embeded or not.
-
-Attached is a simple Korean PDF that so far has not 
-been correctly interpreted by any java based 
-opensource libraries.  Though it could be rendered 
+		<detail>Another feature I need a lot is the correct interpretation 
+of CJK encoding.
+
+Yes, I know PDF can be a pain when it comes to 
+correctly interpreting CJK charsets, as many factors are 
+involved, including whether a font (or its subset) is 
+embeded or not.
+
+Attached is a simple Korean PDF that so far has not 
+been correctly interpreted by any java based 
+opensource libraries.  Though it could be rendered 
 correctly by XPDF on linux and also Windows.</detail>
 		<follow_ups>
 			<item>
@@ -2574,19 +2575,19 @@
 				<text>Logged In: YES 
 user_id=815589
 
-Hello Ben,
-
-Thanks for the response.  I just downloaded PDFBox 0.6.5 and 
-wrote a little sample program to test it against 3 CJK PDF files 
-I have, and the output is still no good.  I have attached my 
-sample program, the 3 PDFs and the output in the attached 
-zip file.
-
-Can you tell me what I am foing wrong?
-
-The PDF files were generated by using Adobe Acrobat 5.0 
-using embeded fonts I believe.
-
+Hello Ben,
+
+Thanks for the response.  I just downloaded PDFBox 0.6.5 and 
+wrote a little sample program to test it against 3 CJK PDF files 
+I have, and the output is still no good.  I have attached my 
+sample program, the 3 PDFs and the output in the attached 
+zip file.
+
+Can you tell me what I am foing wrong?
+
+The PDF files were generated by using Adobe Acrobat 5.0 
+using embeded fonts I believe.
+
 Thank you.</text>
 			</item>
 			<item>
@@ -2595,11 +2596,11 @@
 				<text>Logged In: YES 
 user_id=601708
 
-There was no attachment with this.  I have done some CJK 
-work in the 0.6.5 release.  Please attach the document and I 
-can take a look at it.(Make sure you check the 'attach file' 
-checkbox)
-
+There was no attachment with this.  I have done some CJK 
+work in the 0.6.5 release.  Please attach the document and I 
+can take a look at it.(Make sure you check the 'attach file' 
+checkbox)
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -2648,8 +2649,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Now available in CVS.
-
+Now available in CVS.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -2697,9 +2698,9 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Linearize command line tool</summary>
-		<detail>PDFBox should come with a utility to convert a pdf to a 
-linearized pdf document.
-
+		<detail>PDFBox should come with a utility to convert a pdf to a 
+linearized pdf document.
+
 </detail>
 		<follow_ups>
 			<item>
@@ -2709,7 +2710,7 @@
 user_id=609291
 Originator: NO
 
-I wish to voice my support for this feature request. It would be very useful to us too. Thanks for the great work!
+I wish to voice my support for this feature request. It would be very useful to us too. Thanks for the great work!
 </text>
 			</item>
 			<item>
@@ -2718,13 +2719,13 @@
 				<text>Logged In: YES 
 user_id=989425
 
-I am involved in a free project where we are digitizing a very 
-large number of books and turning them into PDF using open 
-source software to serve on the Net. I thought I'd post a 
-comment to say that linearization would be a really attractive 
-feature for us in PDFBox if implemented.
-
-Youssef Eldakar
+I am involved in a free project where we are digitizing a very 
+large number of books and turning them into PDF using open 
+source software to serve on the Net. I thought I'd post a 
+comment to say that linearization would be a really attractive 
+feature for us in PDFBox if implemented.
+
+Youssef Eldakar
 Bibliotheca Alexandrina</text>
 			</item>
 			<item>
@@ -2733,34 +2734,34 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Some example pdfs (linearized and not):
-	The linearized version is created by an evaluation 
-version of
-PdfLib. Don't worry about the blank page.
-	The pdf is also beeing validated by our in-house pdf 
-expert. I've
-tryed it today.
-	Sadly, it is urgent for us to deliver a correct version 
-of the pdf
-to our customer, I think we will buy a version of 
-	PdfLib (we control
-it via a JNI Bridge)..
-	Anyhow, if you find a way of implementing the 
-linearization in
-PDFBox, I will be happy to throw away PdfLib.
-	I think a constructor like 
-org.pdfbox.pdmodel.PDDocument(COSDocument
-doc, boolean linearize) would be nice, and sorry 	I 
-don't have time to
-help you in enhancing PdfBox now. (Maybe I'll write some 
-examples of basic
-usage pattern of your 	library)
-
-
-See the following for examples.
-Linearized_c_14720040602en00010001.pdf
-Not_Linearized_c_14720040602en00010001.pdf
-
+Some example pdfs (linearized and not):
+	The linearized version is created by an evaluation 
+version of
+PdfLib. Don't worry about the blank page.
+	The pdf is also beeing validated by our in-house pdf 
+expert. I've
+tryed it today.
+	Sadly, it is urgent for us to deliver a correct version 
+of the pdf
+to our customer, I think we will buy a version of 
+	PdfLib (we control
+it via a JNI Bridge)..
+	Anyhow, if you find a way of implementing the 
+linearization in
+PDFBox, I will be happy to throw away PdfLib.
+	I think a constructor like 
+org.pdfbox.pdmodel.PDDocument(COSDocument
+doc, boolean linearize) would be nice, and sorry 	I 
+don't have time to
+help you in enhancing PdfBox now. (Maybe I'll write some 
+examples of basic
+usage pattern of your 	library)
+
+
+See the following for examples.
+Linearized_c_14720040602en00010001.pdf
+Not_Linearized_c_14720040602en00010001.pdf
+
 </text>
 			</item>
 		</follow_ups>
@@ -2784,12 +2785,12 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Implement File Specification</summary>
-		<detail>org.pdfbox.pdmodel.common.filespecification has been 
-created but not implemented.
-
-See PDF Reference 1.5 section 3.10 for File Specification 
-Details.
-
+		<detail>org.pdfbox.pdmodel.common.filespecification has been 
+created but not implemented.
+
+See PDF Reference 1.5 section 3.10 for File Specification 
+Details.
+
 </detail>
 	</artifact>
 	<artifact id="830508">
@@ -2811,9 +2812,9 @@
 				<text>Logged In: YES 
 user_id=601708
 
-supports reading them, does not support writing cross 
-reference streams.
-
+supports reading them, does not support writing cross 
+reference streams.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -2869,8 +2870,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Now available in CVS.
-
+Now available in CVS.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -2932,8 +2933,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Now supported tonights nightly build.
-
+Now supported tonights nightly build.
+
 Ben</text>
 			</item>
 			<item>
@@ -2942,15 +2943,15 @@
 				<text>Logged In: YES 
 user_id=904851
 
-This is essential since you can't tell a 1.5 pdf from
-earlier versions when you're doing automated parsing. 
-
-For example I attempted to parse &amp;quot;PDFReference15_v6.pdf&amp;quot; from
-http://partners.adobe.com/asn/tech/pdf/specifications.jsp.
-Adobe warns that this is &amp;quot;compressed using Acrobat 6
-compression&amp;quot;.
-
-0.6.3 PDFparser.parse() throws an IOException &amp;quot;error:
+This is essential since you can't tell a 1.5 pdf from
+earlier versions when you're doing automated parsing. 
+
+For example I attempted to parse &amp;quot;PDFReference15_v6.pdf&amp;quot; from
+http://partners.adobe.com/asn/tech/pdf/specifications.jsp.
+Adobe warns that this is &amp;quot;compressed using Acrobat 6
+compression&amp;quot;.
+
+0.6.3 PDFparser.parse() throws an IOException &amp;quot;error:
 Expected integer type, actual=&amp;quot;&amp;quot;.</text>
 			</item>
 		</follow_ups>
@@ -2998,9 +2999,9 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Highlight a text string</summary>
-		<detail>Add the ability to highlight a piece of text on a page.  
-This is used when returning results from a search engine.
-
+		<detail>Add the ability to highlight a piece of text on a page.  
+This is used when returning results from a search engine.
+
 </detail>
 		<change_log>
 			<item>
@@ -3022,12 +3023,12 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Allow access to XMP data</summary>
-		<detail>What I am looking for are metadata terms about
-the document that will help classify it. I am currently 
-building a metadata
-database and populating it with dublin core labels. An 
-InputStream would be
-perfect as I'll be able to sax parse it.
+		<detail>What I am looking for are metadata terms about
+the document that will help classify it. I am currently 
+building a metadata
+database and populating it with dublin core labels. An 
+InputStream would be
+perfect as I'll be able to sax parse it.
 </detail>
 		<follow_ups>
 			<item>
@@ -3036,20 +3037,20 @@
 				<text>Logged In: YES 
 user_id=601708
 
-PDDocumentCatalog and PDPage now have 
-
-getMetadata()
-setMetadata()
-
-functions, please let me know if there are other PDF objects 
-that should have access to the metadata.  
-
-See PDStream.createInputStream() to read the metadata and
-PDStream.createOutputStream() to write metadata.
-
-This will be available in the next nightly build available off of 
-www.pdfbox.org
-
+PDDocumentCatalog and PDPage now have 
+
+getMetadata()
+setMetadata()
+
+functions, please let me know if there are other PDF objects 
+that should have access to the metadata.  
+
+See PDStream.createInputStream() to read the metadata and
+PDStream.createOutputStream() to write metadata.
+
+This will be available in the next nightly build available off of 
+www.pdfbox.org
+
 Ben </text>
 			</item>
 			<item>
@@ -3058,9 +3059,9 @@
 				<text>Logged In: YES 
 user_id=1156767
 
-I'd like to broaden the request. I'd like full I/O access to XMP 
-metadata in the PDF. Stream access is ok. I think passing 
-Strings is reasonable as well. No need to parse it as I can do 
+I'd like to broaden the request. I'd like full I/O access to XMP 
+metadata in the PDF. Stream access is ok. I think passing 
+Strings is reasonable as well. No need to parse it as I can do 
 that. Thanks.</text>
 			</item>
 		</follow_ups>
@@ -3090,7 +3091,7 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>Allow compressing a document</summary>
-		<detail>Iterate through all streams and compress then if they 
+		<detail>Iterate through all streams and compress then if they 
 are not already compressed.</detail>
 	</artifact>
 	<artifact id="1227072">
@@ -3104,9 +3105,9 @@
 		<status>Open</status>
 		<resolution>None</resolution>
 		<summary>PDF/X</summary>
-		<detail>to create a pdf, that is PDF/X conform.
-If i use a object/element/... (e.g. RGB-colors), i
-would prefere a NotPdfXEception().
+		<detail>to create a pdf, that is PDF/X conform.
+If i use a object/element/... (e.g. RGB-colors), i
+would prefere a NotPdfXEception().
 </detail>
 		<change_log>
 			<item>
@@ -3128,7 +3129,7 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Add support for attachments/Embedded files</summary>
-		<detail>Need to support embedded files and attachments.  see 
+		<detail>Need to support embedded files and attachments.  see 
 form_example_with_attachements.pdf</detail>
 		<follow_ups>
 			<item>
@@ -3137,10 +3138,10 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Done!
-See http://www.pdfbox.org/userguide/file_references.html (will 
-appear when site get regenerated tonight)
-
+Done!
+See http://www.pdfbox.org/userguide/file_references.html (will 
+appear when site get regenerated tonight)
+
 Ben Litchfield</text>
 			</item>
 		</follow_ups>
@@ -3170,7 +3171,7 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Font Embedding</summary>
-		<detail>Need to add the ability to embed a new font into a PDF 
+		<detail>Need to add the ability to embed a new font into a PDF 
 document.</detail>
 		<follow_ups>
 			<item>
@@ -3179,9 +3180,9 @@
 				<text>Logged In: YES 
 user_id=601708
 
-It is now possible to embed a TTF font, currently only WinAnsi 
-encoding is supported.
-
+It is now possible to embed a TTF font, currently only WinAnsi 
+encoding is supported.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -3229,7 +3230,7 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>extract the text of certain page at certain line</summary>
-		<detail>sometimes,it is unneeded to extraction all the text in a 
+		<detail>sometimes,it is unneeded to extraction all the text in a 
 pdf file :-)</detail>
 		<follow_ups>
 			<item>
@@ -3267,7 +3268,7 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Add Bookmarks to existing PDF</summary>
-		<detail>Allow the creation of a bookmarks/TOC for an existing 
+		<detail>Allow the creation of a bookmarks/TOC for an existing 
 PDF.</detail>
 		<follow_ups>
 			<item>
@@ -3276,9 +3277,9 @@
 				<text>Logged In: YES 
 user_id=601708
 
-implemented in CVS, see http://www.pdfbox.org  Developers
-Guide-&gt;bookmarks
-
+implemented in CVS, see http://www.pdfbox.org  Developers
+Guide-&gt;bookmarks
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -3308,10 +3309,10 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Document Level Navigation</summary>
-		<detail>org.pdfbox.pdmodel.interactive.documentnavigation has 
-been created but needs to be implemented.  See PDF 
-Reference 1.5 Section 8.2
-
+		<detail>org.pdfbox.pdmodel.interactive.documentnavigation has 
+been created but needs to be implemented.  See PDF 
+Reference 1.5 Section 8.2
+
 </detail>
 		<follow_ups>
 			<item>
@@ -3320,9 +3321,9 @@
 				<text>Logged In: YES 
 user_id=601708
 
-implemented in CVS, see http://www.pdfbox.org  Developers
-Guide-&gt;bookmarks
-
+implemented in CVS, see http://www.pdfbox.org  Developers
+Guide-&gt;bookmarks
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -3352,8 +3353,8 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>extract TOC/bookmarks</summary>
-		<detail>Add the ability to extract data from the optional Table of 
-Contents (=outlines) or retrieve a list of the bookmarks 
+		<detail>Add the ability to extract data from the optional Table of 
+Contents (=outlines) or retrieve a list of the bookmarks 
 in a specific PDF as well as the page number it points to</detail>
 		<follow_ups>
 			<item>
@@ -3362,9 +3363,9 @@
 				<text>Logged In: YES 
 user_id=601708
 
-implemented in CVS, see http://www.pdfbox.org  Developers
-Guide-&gt;bookmarks
-
+implemented in CVS, see http://www.pdfbox.org  Developers
+Guide-&gt;bookmarks
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -3408,8 +3409,8 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Created org.pdfbox.examples.pdmodel.ImageToPDF
-
+Created org.pdfbox.examples.pdmodel.ImageToPDF
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -3445,13 +3446,13 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Decrypt/Encrypt should write to different file</summary>
-		<detail>Now these utilities can only write to the same file that is 
-passed in, it should allow it to write to a new file.  For 
-example
-
-java org.pdfbox.Decrypt inputPDF.pdf password 
-output.pdf
-
+		<detail>Now these utilities can only write to the same file that is 
+passed in, it should allow it to write to a new file.  For 
+example
+
+java org.pdfbox.Decrypt inputPDF.pdf password 
+output.pdf
+
 </detail>
 		<follow_ups>
 			<item>
@@ -3489,11 +3490,11 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Add option to disable duplicate text</summary>
-		<detail>The PDFTextStripper will suppress text that appears at 
-the same location as other text, as that is how word 
-does some bolding.  There should be an option to disable 
-this functionality.
-
+		<detail>The PDFTextStripper will suppress text that appears at 
+the same location as other text, as that is how word 
+does some bolding.  There should be an option to disable 
+this functionality.
+
 Ben</detail>
 		<follow_ups>
 			<item>
@@ -3502,13 +3503,13 @@
 				<text>Logged In: YES 
 user_id=601708
 
-Now implemented see
-
-PDFTextStripper.SuppressDuplicateOverlappingText property
-
-By default this is true, setting it to false may result in higher 
-performance.
-
+Now implemented see
+
+PDFTextStripper.SuppressDuplicateOverlappingText property
+
+By default this is true, setting it to false may result in higher 
+performance.
+
 Ben</text>
 			</item>
 		</follow_ups>
@@ -3538,26 +3539,26 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Text extraction should follow PDF article divisions</summary>
-		<detail>Hi ben,
-
-There is a feature in PDF documents that is called
-"article division" and which is used to ensure that the
-text structure is readed in blocks. For example, for a
-full page with two columns, there is two article
-divisions (one for each column).
-
-In multiple columns pages, the PDF is sometimes
-generated with all the columns rows merged (i-th row of
-each column are appended to form the full line). In
-that case, the extracted text has no real sense because
-the real text lines are interlaced. 
-
-In that later case, PDFBox text extraction methods
-should follow the "article divisions" marks to get the
-true text, which is not the case in version 0.6.6.
-
-I don't know if this is possible, but I would be great.
-
+		<detail>Hi ben,
+
+There is a feature in PDF documents that is called
+"article division" and which is used to ensure that the
+text structure is readed in blocks. For example, for a
+full page with two columns, there is two article
+divisions (one for each column).
+
+In multiple columns pages, the PDF is sometimes
+generated with all the columns rows merged (i-th row of
+each column are appended to form the full line). In
+that case, the extracted text has no real sense because
+the real text lines are interlaced. 
+
+In that later case, PDFBox text extraction methods
+should follow the "article divisions" marks to get the
+true text, which is not the case in version 0.6.6.
+
+I don't know if this is possible, but I would be great.
+
 Julien</detail>
 		<follow_ups>
 			<item>
@@ -3574,33 +3575,33 @@
 				<text>Logged In: YES 
 user_id=820653
 
-Works well with the attached "pdf.zip" but not with the new
-"test.zip", so this ticket should be removed.
-
-By the way, it is difficult to identify the files that have
-the "multiple columns mixed text" problem, so I wrote a
-simple PDF detector (files with high score, saying greater
-than 2e-6 can potentially have multiple columns mixed text):
-
-  // "multiple column text mixed" detector
-  Pattern p_mctm = Pattern.compile(
-                    ".*[a-z]\\- (?!and|or|to).*");
-                    // not " - ", "X- and Y-ZZZ"
-  Matcher m_mctm = p_mctm.matcher(file_content);
-  if (m_mctm.find()) {
-    int nbr=0; 
-    while (m_mctm.find()) {
-      nbr++;
-    }//end while
-    System.err.println("Warning: possible \"multiple "+
-                             "column text mixed\" ("+
-                             nbr/(float)file_content.length()+
-                             ") in
-"+current_file.getAbsolutePath());
-  }//end if
-
-I used it to find the new submited "test.zip" file.
-
+Works well with the attached "pdf.zip" but not with the new
+"test.zip", so this ticket should be removed.
+
+By the way, it is difficult to identify the files that have
+the "multiple columns mixed text" problem, so I wrote a
+simple PDF detector (files with high score, saying greater
+than 2e-6 can potentially have multiple columns mixed text):
+
+  // "multiple column text mixed" detector
+  Pattern p_mctm = Pattern.compile(
+                    ".*[a-z]\\- (?!and|or|to).*");
+                    // not " - ", "X- and Y-ZZZ"
+  Matcher m_mctm = p_mctm.matcher(file_content);
+  if (m_mctm.find()) {
+    int nbr=0; 
+    while (m_mctm.find()) {
+      nbr++;
+    }//end while
+    System.err.println("Warning: possible \"multiple "+
+                             "column text mixed\" ("+
+                             nbr/(float)file_content.length()+
+                             ") in
+"+current_file.getAbsolutePath());
+  }//end if
+
+I used it to find the new submited "test.zip" file.
+
 Julien</text>
 			</item>
 			<item>
@@ -3609,11 +3610,11 @@
 				<text>Logged In: YES 
 user_id=601708
 
-This was a great suggestion that really improves text 
-extraction on a lot of PDF documents.  It works on the 
-attached PDF, please give it a try and reopen this ticket if 
-you notice any problems.
-
+This was a great suggestion that really improves text 
+extraction on a lot of PDF documents.  It works on the 
+attached PDF, please give it a try and reopen this ticket if 
+you notice any problems.
+
 Ben</text>
 			</item>
 			<item>
@@ -3622,17 +3623,17 @@
 				<text>Logged In: YES 
 user_id=820653
 
-Sorry, I forgot to mention that the text was extracted with
-nightly build  PDFBox-0.6.7-dev-20040905.
-Then, after extraction, the following modifications have
-been made:
-  text=text.replaceAll("-\\x0D\\x0A","");  // word caesura
-(could be wrong sometimes, like in "non-\nstationary" or for
-multiple columns text
-  text=text.replaceAll("\\xAE","fi");  // strange
-replacement for "fi"
-  text=text.replaceAll("\\xAF","fl");  // strange
-replacement for "fl"
+Sorry, I forgot to mention that the text was extracted with
+nightly build  PDFBox-0.6.7-dev-20040905.
+Then, after extraction, the following modifications have
+been made:
+  text=text.replaceAll("-\\x0D\\x0A","");  // word caesura
+(could be wrong sometimes, like in "non-\nstationary" or for
+multiple columns text
+  text=text.replaceAll("\\xAE","fi");  // strange
+replacement for "fi"
+  text=text.replaceAll("\\xAF","fl");  // strange
+replacement for "fl"
 </text>
 			</item>
 			<item>
@@ -3641,11 +3642,11 @@
 				<text>Logged In: YES 
 user_id=820653
 
-Yep. Please look at the attached Zip file. It contains a two
-columns PDF and the corresponding extracted text
-(interlaced), and the same PDF with the article division
-feature.
-
+Yep. Please look at the attached Zip file. It contains a two
+columns PDF and the corresponding extracted text
+(interlaced), and the same PDF with the article division
+feature.
+
 Julien</text>
 			</item>
 			<item>
@@ -3730,10 +3731,10 @@
 		<status>Closed</status>
 		<resolution>None</resolution>
 		<summary>Support named target for Bookmark extraction</summary>

[... 3382 lines stripped ...]