You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pdfbox.apache.org by ms...@apache.org on 2015/10/30 16:28:56 UTC

[1/3] pdfbox-docs git commit: PDFBOX-3040: use .md for markdown files

Repository: pdfbox-docs
Updated Branches:
  refs/heads/master 442881561 -> c68c6530d


http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/support.mdtext
----------------------------------------------------------------------
diff --git a/content/support.mdtext b/content/support.mdtext
deleted file mode 100644
index 0969b5e..0000000
--- a/content/support.mdtext
+++ /dev/null
@@ -1,53 +0,0 @@
----
-layout: default
-title:  Support
----
-
-# Support
-
-## Questions about How to use PDFBox
-
-If you have questions about how to use PDFBox do ask on the [Users Mailing List](/mailinglists.html "Subscribe to Mailing List"). This will get you help from the entire community.
-
-The PDFBox examples and the test code in the sources will also provide additional information.
-
-And there are additonal resources available on sites such as [Stack Overflow](http://stackoverflow.com/search?q=pdfbox "Stack Overflow").
-
-
-## Filing a bug report or enhancement request
-
-<p class="alert alert-info">Please refrain from immediately opening a ticket in the issue tracker unless 
-you are really certain it's a problem in the PDFBox software. Try using the Mailing Lists 
-before.</p>
-
-If you are sure you have found a bug the please report the problem in our 
-[Issue Tracker](https://issues.apache.org/jira/browse/PDFBOX). 
-
-**Before you submit a bug there are several things you can try first**
-
- - for issues with text extraction try if Adobe Reader can extract the text
- - try the latest SNAPSHOT to see if it's fixed in the pre-release
- - search the mailing list to see if has been discussed before
- - check the issue tracker to see if the issue has already been reported
-
-**To help us resolving a bug quicker**
-
- - attach the PDF that makes trouble by using "More", "Attach files" in the issue tracker
- - if your file is too large, upload it to a sharehoster, or use the PDFSplit application to isolate the troublesome page
- - mention the PDFBox version you are using.
- - attach the shortest possible code that reproduces the problem. Insert java code between {code}...{code}. Or try to reproduce the problem with the command line applications.
- - mention what you were doing, what was the expected behaviour, and what happened instead
- - provide a stack trace of an exception if there is one
- - try using the non-sequential parser (loadNonSeq() instead of load(), and "-nonSeq" with the command line applications)
- - search JIRA if your problem has been mentioned before.
- - Be patient: all the people here are unpaid volunteers who work for you in their free time
-
-**And please DON'T**
-
- - upload files to a hoster that requires registration to read the file.
- - create an issue in JIRA and then go on vacation so you won't repond to our questions / suggestions.
- - ask "how to" questions in JIRA. Ask such questions on the mailing lists, on stackoverflow.com, and look at the sample and the test code in the sources.
- - attach PDF files with confidential and/or personal data (name, DoB, bank data, health data, SSN) without getting permission from the client and/or the people mentioned on the PDF
- - create issues about obsolete PDFBox versions
-
-<p class="alert alert-info">We can sometimes solve problems without having the PDF, but it is difficult.</p>

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/team.md
----------------------------------------------------------------------
diff --git a/content/team.md b/content/team.md
new file mode 100644
index 0000000..2988bde
--- /dev/null
+++ b/content/team.md
@@ -0,0 +1,45 @@
+---
+layout: default
+title:  Project Team
+---
+
+# Project Team
+
+A successful project requires many people to play many roles. Some members write code or documentation, while others are valuable as testers,
+submitting patches and suggestions.
+
+The team is comprised of Members and Contributors. Members have direct access to the source of a project and actively evolve the code-base.
+Contributors improve the project through submission of patches and suggestions to the Members. The number of Contributors to the project is unbounded.
+Get involved today. All contributions to the project are greatly appreciated.
+
+## Members
+
+The following is a list of developers with commit privileges that have directly contributed to the project in one way or another.
+
+| Id            | Name                  | Roles      | 
+| ------------- | --------------------- | ---------- |
+| adam          | Adam Nichols          | PMC Member |
+| lehmi         | Andreas Lehmkühler    | PMC Chair  |
+| blitchfield   | Ben Litchfield        | PMC Member |
+| carrier       | Brian Carrier         | PMC Member |
+| danielwilson  | Daniel Wilson         | PMC Member |
+| gbailleul     | Guillaume Bailleul    | PMC Member |
+| jeremias      | Jeremias Maerki       | PMC Member |
+| koch          | Johannes Koch         | PMC Member |
+| jahewson      | John Hewson           | PMC Member |
+| kjackson      | Kevin Jackson         | PMC Member |
+| msahyoun      | Maruan Sahyoun        | PMC Member |
+| pkoch         | Phillipp Koch         | PMC Member |
+| tchojecki     | Thomas Chojecki       | PMC Member |
+| tboehme       | Timo Boehme           | PMC Member |
+| tilman        | Tilman Hausherr       | PMC Member |
+| vfed          | Villu Ruusmann        | PMC Member |
+
+## Emeritus members
+
+The following is a list of former developers going emeritus from the PDFBox PMC.
+
+| Id            | Name                  | Roles      | 
+| ------------- | -------------         | ---------- |
+| leleueri      | Eric Leleu            |            |
+| jukka         | Jukka Zitting         |            |

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/team.mdtext
----------------------------------------------------------------------
diff --git a/content/team.mdtext b/content/team.mdtext
deleted file mode 100644
index 2988bde..0000000
--- a/content/team.mdtext
+++ /dev/null
@@ -1,45 +0,0 @@
----
-layout: default
-title:  Project Team
----
-
-# Project Team
-
-A successful project requires many people to play many roles. Some members write code or documentation, while others are valuable as testers,
-submitting patches and suggestions.
-
-The team is comprised of Members and Contributors. Members have direct access to the source of a project and actively evolve the code-base.
-Contributors improve the project through submission of patches and suggestions to the Members. The number of Contributors to the project is unbounded.
-Get involved today. All contributions to the project are greatly appreciated.
-
-## Members
-
-The following is a list of developers with commit privileges that have directly contributed to the project in one way or another.
-
-| Id            | Name                  | Roles      | 
-| ------------- | --------------------- | ---------- |
-| adam          | Adam Nichols          | PMC Member |
-| lehmi         | Andreas Lehmkühler    | PMC Chair  |
-| blitchfield   | Ben Litchfield        | PMC Member |
-| carrier       | Brian Carrier         | PMC Member |
-| danielwilson  | Daniel Wilson         | PMC Member |
-| gbailleul     | Guillaume Bailleul    | PMC Member |
-| jeremias      | Jeremias Maerki       | PMC Member |
-| koch          | Johannes Koch         | PMC Member |
-| jahewson      | John Hewson           | PMC Member |
-| kjackson      | Kevin Jackson         | PMC Member |
-| msahyoun      | Maruan Sahyoun        | PMC Member |
-| pkoch         | Phillipp Koch         | PMC Member |
-| tchojecki     | Thomas Chojecki       | PMC Member |
-| tboehme       | Timo Boehme           | PMC Member |
-| tilman        | Tilman Hausherr       | PMC Member |
-| vfed          | Villu Ruusmann        | PMC Member |
-
-## Emeritus members
-
-The following is a list of former developers going emeritus from the PDFBox PMC.
-
-| Id            | Name                  | Roles      | 
-| ------------- | -------------         | ---------- |
-| leleueri      | Eric Leleu            |            |
-| jukka         | Jukka Zitting         |            |

[2/3] pdfbox-docs git commit: PDFBOX-3040: use .md for markdown files

Posted by ms...@apache.org.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/dependencies.md
----------------------------------------------------------------------
diff --git a/content/1.8/dependencies.md b/content/1.8/dependencies.md
new file mode 100644
index 0000000..da3174f
--- /dev/null
+++ b/content/1.8/dependencies.md
@@ -0,0 +1,96 @@
+---
+layout: default
+title:  Dependencies
+---
+
+# Dependencies
+
+PDFBox consists of a three related components and depends on a few external libraries. This page describes what these libraries are and how to include them in your application.
+
+## Core components
+
+<p class="alert alert-info">These components are needed during runtime, development and testing dependent on the details below.</p>
+
+The three PDFBox components are named ```pdfbox```, ```fontbox``` and ```jempbox```. The Maven groupId of all PDFBox components is org.apache.pdfbox.
+
+### Minimum Requirement
+
+- Java 1.5
+- [commons-logging](http://commons.apache.org/logging/)
+
+The main PDFBox component, pdfbox, has a hard dependency on the [commons-logging](http://commons.apache.org/logging/) library.
+Commons Logging is a generic wrapper around different logging frameworks, so you'll either need to also use a logging library like [log4j](http://logging.apache.org/log4j/)
+or let commons-logging fall back to the standard [java.util.logging API](http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html)
+included in the Java platform.
+
+### Font Handling
+For font handling the fontbox component is needed.
+
+### XMP Metadata 
+To support XMP metadata the jembox component is needed.
+
+To add the pdfbox, fontbox, jempbox and commons-logging jars to your application, the easiest thing is to declare the Maven dependency shown below. This gives you the main
+pdfbox library directly and the other required jars as transitive dependencies.
+
+    <dependency>
+      <groupId>org.apache.pdfbox</groupId>
+      <artifactId>pdfbox</artifactId>
+      <version>...</version>
+    </dependency>
+
+Set the version field to the latest stable PDFBox version.
+
+## Optional dependencies
+
+Some features in PDFBox depend on optional external libraries. You can enable these features simply by including the required libraries in the classpath of your application.
+
+### Extented Image Format Support
+
+To support JBIG2 and writing TIFF images additional libraries are needed. 
+
+<p class="alert alert-warning">The image plugins described below are not part of the PDFBox distribution because of incompatible licensing terms. Please make sure to check if the licensing terms are compatible to your usage.</p>
+
+For **JBIG2** support a Java ImageIO Plugin such as the [Levigo Plugin](https://github.com/levigo/jbig2-imageio) or [JBIG2-Image-Decoder
+](https://github.com/Borisvl/JBIG2-Image-Decoder) will be needed. 
+
+To write **TIFF** images a JAI ImageIO Core library will be needed. 
+
+#### PDF Encryption and Signing
+The most notable such optional feature is support for PDF encryption. Instead of implementing its own encryption algorithms, PDFBox uses libraries from the 
+[Legion of the Bouncy Castle](http://www.bouncycastle.org/). Both the bcprov and bcmail libraries are needed and can be included using the Maven dependencies shown below.
+
+    <dependency>
+      <groupId>org.bouncycastle</groupId>
+      <artifactId>bcprov-jdk15</artifactId>
+      <version>1.44</version>
+    </dependency>
+    <dependency>
+      <groupId>org.bouncycastle</groupId>
+      <artifactId>bcmail-jdk15</artifactId>
+      <version>1.44</version>
+    </dependency>
+ 
+<br/>
+
+#### Support for bidirectional languages
+Another important optional feature is support for bidirectional languages like Arabic. PDFBox uses the ICU4J library from the 
+[International Components for Unicode](http://site.icu-project.org/) (ICU) project to support such languages in PDF documents. To add the ICU4J jar to your project, 
+use the following Maven dependency.
+
+    <dependency>
+      <groupId>com.ibm.icu</groupId>
+      <artifactId>icu4j</artifactId>
+      <version>3.8</version>
+    </dependency>
+
+PDFBox also contains extra support for use with the [Lucene](http://lucene.apache.org/) and [Ant](http://ant.apache.org/) projects. Since in these cases PDFBox is just an
+add-on feature to these projects, you should first set up your application to use Lucene or Ant and then add PDFBox support as described on this page.
+
+## Dependencies for Ant builds
+
+The above instructions expect that you're using [Maven](http://maven.apache.org/) or another build tool like [Ivy](http://ant.apache.org/ivy/) that supports Maven dependencies.
+If you instead use tools like [Ant](http://ant.apache.org/) where you need to explicitly include all the required library jars in your application, you'll need to do
+something different.
+
+The easiest approach is to run ``mvn dependency:copy-dependencies`` inside the pdfbox directory of the latest PDFBox source release. This will copy all the required and optional
+libraries discussed above into the pdfbox/target/dependencies directory. You can then simply copy all the libraries you need from this directory to your application.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/dependencies.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/dependencies.mdtext b/content/1.8/dependencies.mdtext
deleted file mode 100644
index da3174f..0000000
--- a/content/1.8/dependencies.mdtext
+++ /dev/null
@@ -1,96 +0,0 @@
----
-layout: default
-title:  Dependencies
----
-
-# Dependencies
-
-PDFBox consists of a three related components and depends on a few external libraries. This page describes what these libraries are and how to include them in your application.
-
-## Core components
-
-<p class="alert alert-info">These components are needed during runtime, development and testing dependent on the details below.</p>
-
-The three PDFBox components are named ```pdfbox```, ```fontbox``` and ```jempbox```. The Maven groupId of all PDFBox components is org.apache.pdfbox.
-
-### Minimum Requirement
-
-- Java 1.5
-- [commons-logging](http://commons.apache.org/logging/)
-
-The main PDFBox component, pdfbox, has a hard dependency on the [commons-logging](http://commons.apache.org/logging/) library.
-Commons Logging is a generic wrapper around different logging frameworks, so you'll either need to also use a logging library like [log4j](http://logging.apache.org/log4j/)
-or let commons-logging fall back to the standard [java.util.logging API](http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html)
-included in the Java platform.
-
-### Font Handling
-For font handling the fontbox component is needed.
-
-### XMP Metadata 
-To support XMP metadata the jembox component is needed.
-
-To add the pdfbox, fontbox, jempbox and commons-logging jars to your application, the easiest thing is to declare the Maven dependency shown below. This gives you the main
-pdfbox library directly and the other required jars as transitive dependencies.
-
-    <dependency>
-      <groupId>org.apache.pdfbox</groupId>
-      <artifactId>pdfbox</artifactId>
-      <version>...</version>
-    </dependency>
-
-Set the version field to the latest stable PDFBox version.
-
-## Optional dependencies
-
-Some features in PDFBox depend on optional external libraries. You can enable these features simply by including the required libraries in the classpath of your application.
-
-### Extented Image Format Support
-
-To support JBIG2 and writing TIFF images additional libraries are needed. 
-
-<p class="alert alert-warning">The image plugins described below are not part of the PDFBox distribution because of incompatible licensing terms. Please make sure to check if the licensing terms are compatible to your usage.</p>
-
-For **JBIG2** support a Java ImageIO Plugin such as the [Levigo Plugin](https://github.com/levigo/jbig2-imageio) or [JBIG2-Image-Decoder
-](https://github.com/Borisvl/JBIG2-Image-Decoder) will be needed. 
-
-To write **TIFF** images a JAI ImageIO Core library will be needed. 
-
-#### PDF Encryption and Signing
-The most notable such optional feature is support for PDF encryption. Instead of implementing its own encryption algorithms, PDFBox uses libraries from the 
-[Legion of the Bouncy Castle](http://www.bouncycastle.org/). Both the bcprov and bcmail libraries are needed and can be included using the Maven dependencies shown below.
-
-    <dependency>
-      <groupId>org.bouncycastle</groupId>
-      <artifactId>bcprov-jdk15</artifactId>
-      <version>1.44</version>
-    </dependency>
-    <dependency>
-      <groupId>org.bouncycastle</groupId>
-      <artifactId>bcmail-jdk15</artifactId>
-      <version>1.44</version>
-    </dependency>
- 
-<br/>
-
-#### Support for bidirectional languages
-Another important optional feature is support for bidirectional languages like Arabic. PDFBox uses the ICU4J library from the 
-[International Components for Unicode](http://site.icu-project.org/) (ICU) project to support such languages in PDF documents. To add the ICU4J jar to your project, 
-use the following Maven dependency.
-
-    <dependency>
-      <groupId>com.ibm.icu</groupId>
-      <artifactId>icu4j</artifactId>
-      <version>3.8</version>
-    </dependency>
-
-PDFBox also contains extra support for use with the [Lucene](http://lucene.apache.org/) and [Ant](http://ant.apache.org/) projects. Since in these cases PDFBox is just an
-add-on feature to these projects, you should first set up your application to use Lucene or Ant and then add PDFBox support as described on this page.
-
-## Dependencies for Ant builds
-
-The above instructions expect that you're using [Maven](http://maven.apache.org/) or another build tool like [Ivy](http://ant.apache.org/ivy/) that supports Maven dependencies.
-If you instead use tools like [Ant](http://ant.apache.org/) where you need to explicitly include all the required library jars in your application, you'll need to do
-something different.
-
-The easiest approach is to run ``mvn dependency:copy-dependencies`` inside the pdfbox directory of the latest PDFBox source release. This will copy all the required and optional
-libraries discussed above into the pdfbox/target/dependencies directory. You can then simply copy all the libraries you need from this directory to your application.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/faq.md
----------------------------------------------------------------------
diff --git a/content/1.8/faq.md b/content/1.8/faq.md
new file mode 100644
index 0000000..018af6d
--- /dev/null
+++ b/content/1.8/faq.md
@@ -0,0 +1,143 @@
+---
+layout: default
+title:  Frequently Asked Questions (FAQ)
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+---
+
+# Frequently asked questions
+
+### General Questions
+
+ - [I am getting the below Log4J warning message, how do I remove it?](#log4j)
+ - [Is PDFBox thread safe?](#threadsafe)
+ - [Why do I get a "Warning: You did not close the PDF Document"?](#notclosed)
+
+### Text Extraction
+
+ - [How come I am not getting any text from the PDF document?](#notext)
+ - [How come I am getting gibberish(G38G43G36G51G5) when extracting text?](#gibberish)
+ - [What does "java.io.IOException: Can't handle font width" mean?](#fontwidth)
+ - [Why do I get "You do not have permission to extract text" on some documents?](#permission)
+ - [Can't we just extract the text without parsing the whole document or extract text as it is parsed?](#partially)
+
+## General Questions
+
+<a name="log4j"></a>
+### I am getting the below Log4J warning message, how do I remove it? ###
+
+```java
+log4j:WARN No appenders could be found for logger (org.apache.pdfbox.util.ResourceLoader).
+log4j:WARN Please initialize the log4j system properly.
+```
+
+This message means that you need to configure the log4j logging system.
+See the [log4j documentation](http://logging.apache.org/log4j/1.2/manual.html) for more information.
+
+PDFBox comes with a sample log4j configuration file.  To use it you set a system property like this
+
+```java
+java -Dlog4j.configuration=log4j.xml org.apache.pdfbox.ExtractText <PDF-file> <output-text-file>
+```
+
+If this is not working for you then you may have to specify the log4j config file using a URL path, like this:
+
+```java
+log4j.configuration=file:///<path to config file>
+```
+
+Please see [this](https://sourceforge.net/forum/forum.php?thread_id=1254229&amp;forum_id=267205) forum thread 
+for more information.
+
+<a name="threadsafe"></a>
+### Is PDFBox thread safe? ###
+
+No! Only one thread may access a single document at a time. You can have multiple threads
+each accessing their own PDDocument object.
+
+<a name="notclosed"></a>
+### Why do I get a "Warning: You did not close the PDF Document"? ###
+
+You need to call close() on the PDDocument inside the finally block, if you
+don't then the document will not be closed properly.  Also, you must close all
+PDDocument objects that get created.  The following code creates **two**
+PDDocument objects; one from the "new PDDocument()" and the second by the load method.
+
+```java
+PDDocument doc = new PDDocument();
+try
+{
+   doc = PDDocument.load( "my.pdf" );
+}
+finally
+{
+   if( doc != null )
+   {
+      doc.close();
+   }
+}
+```
+
+## Text Extraction
+
+<a name="notext"></a>
+### How come I am not getting any text from the PDF document? ###
+
+Text extraction from a pdf document is a complicated task and there are many factors
+involved that effect the possibility and accuracy of text extraction.  It would be helpful
+to the PDFBox team if you could try a couple things.
+
+ - Open the PDF in Acrobat and try to extract text from there.  If Acrobat can extract text then PDFBox 
+should be able to as well and it is a bug if it cannot.  If Acrobat cannot extract text then PDFBox 'probably' cannot either.
+ - It might really be an image instead of text.  Some PDF documents are just images that have been scanned in.
+You can tell by using the selection tool in Acrobat, if you can't select any text then it is probably an image.
+
+<a name="gibberish"></a>
+### How come I am getting gibberish(G38G43G36G51G5) when extracting text? ###
+
+This is because the characters in a PDF document can use a custom encoding
+instead of unicode or ASCII.  When you see gibberish text then it
+probably means that a meaningless internal encoding is being used.  The
+only way to access the text is to use OCR.  This may be a future
+enhancement.
+
+<a name="fontwidth"></a>
+### What does "java.io.IOException: Can't handle font width" mean? ###
+
+This probably means that the "Resources" directory is not in your classpath. The
+Resources directory is included in the PDFBox jar so this is only a problem if you
+are building PDFBox yourself and not using the binary.
+
+<a name="permission"></a>
+### Why do I get "You do not have permission to extract text" on some documents? ###
+
+PDF documents have certain security permissions that can be applied to them and two 
+passwords associated with them, a user password and a master password. If the "cannot extract text"
+permission bit is set then you need to decrypt the document with the master password in order
+to extract the text.
+
+<a name="partially"></a>
+### Can't we just extract the text without parsing the whole document or extract text as it is parsed? ###
+
+Not really, for a couple reasons.
+
+ - If the document is encrypted then you need to parse at least until the encryption dictionary before 
+you can decrypt.
+ - Sometimes the PDFont contains vital information needed for text extraction.
+ - Text on a page does not have to be drawn in reading order. For example: if the page said "Hello World",
+the pdf could have been written such that "World" gets drawn and then the cursor moves to the left and 
+the word "Hello" is drawn.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/faq.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/faq.mdtext b/content/1.8/faq.mdtext
deleted file mode 100644
index 018af6d..0000000
--- a/content/1.8/faq.mdtext
+++ /dev/null
@@ -1,143 +0,0 @@
----
-layout: default
-title:  Frequently Asked Questions (FAQ)
-Notice:    Licensed to the Apache Software Foundation (ASF) under one
-           or more contributor license agreements.  See the NOTICE file
-           distributed with this work for additional information
-           regarding copyright ownership.  The ASF licenses this file
-           to you under the Apache License, Version 2.0 (the
-           "License"); you may not use this file except in compliance
-           with the License.  You may obtain a copy of the License at
-           .
-             http://www.apache.org/licenses/LICENSE-2.0
-           .
-           Unless required by applicable law or agreed to in writing,
-           software distributed under the License is distributed on an
-           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-           KIND, either express or implied.  See the License for the
-           specific language governing permissions and limitations
-           under the License.
----
-
-# Frequently asked questions
-
-### General Questions
-
- - [I am getting the below Log4J warning message, how do I remove it?](#log4j)
- - [Is PDFBox thread safe?](#threadsafe)
- - [Why do I get a "Warning: You did not close the PDF Document"?](#notclosed)
-
-### Text Extraction
-
- - [How come I am not getting any text from the PDF document?](#notext)
- - [How come I am getting gibberish(G38G43G36G51G5) when extracting text?](#gibberish)
- - [What does "java.io.IOException: Can't handle font width" mean?](#fontwidth)
- - [Why do I get "You do not have permission to extract text" on some documents?](#permission)
- - [Can't we just extract the text without parsing the whole document or extract text as it is parsed?](#partially)
-
-## General Questions
-
-<a name="log4j"></a>
-### I am getting the below Log4J warning message, how do I remove it? ###
-
-```java
-log4j:WARN No appenders could be found for logger (org.apache.pdfbox.util.ResourceLoader).
-log4j:WARN Please initialize the log4j system properly.
-```
-
-This message means that you need to configure the log4j logging system.
-See the [log4j documentation](http://logging.apache.org/log4j/1.2/manual.html) for more information.
-
-PDFBox comes with a sample log4j configuration file.  To use it you set a system property like this
-
-```java
-java -Dlog4j.configuration=log4j.xml org.apache.pdfbox.ExtractText <PDF-file> <output-text-file>
-```
-
-If this is not working for you then you may have to specify the log4j config file using a URL path, like this:
-
-```java
-log4j.configuration=file:///<path to config file>
-```
-
-Please see [this](https://sourceforge.net/forum/forum.php?thread_id=1254229&amp;forum_id=267205) forum thread 
-for more information.
-
-<a name="threadsafe"></a>
-### Is PDFBox thread safe? ###
-
-No! Only one thread may access a single document at a time. You can have multiple threads
-each accessing their own PDDocument object.
-
-<a name="notclosed"></a>
-### Why do I get a "Warning: You did not close the PDF Document"? ###
-
-You need to call close() on the PDDocument inside the finally block, if you
-don't then the document will not be closed properly.  Also, you must close all
-PDDocument objects that get created.  The following code creates **two**
-PDDocument objects; one from the "new PDDocument()" and the second by the load method.
-
-```java
-PDDocument doc = new PDDocument();
-try
-{
-   doc = PDDocument.load( "my.pdf" );
-}
-finally
-{
-   if( doc != null )
-   {
-      doc.close();
-   }
-}
-```
-
-## Text Extraction
-
-<a name="notext"></a>
-### How come I am not getting any text from the PDF document? ###
-
-Text extraction from a pdf document is a complicated task and there are many factors
-involved that effect the possibility and accuracy of text extraction.  It would be helpful
-to the PDFBox team if you could try a couple things.
-
- - Open the PDF in Acrobat and try to extract text from there.  If Acrobat can extract text then PDFBox 
-should be able to as well and it is a bug if it cannot.  If Acrobat cannot extract text then PDFBox 'probably' cannot either.
- - It might really be an image instead of text.  Some PDF documents are just images that have been scanned in.
-You can tell by using the selection tool in Acrobat, if you can't select any text then it is probably an image.
-
-<a name="gibberish"></a>
-### How come I am getting gibberish(G38G43G36G51G5) when extracting text? ###
-
-This is because the characters in a PDF document can use a custom encoding
-instead of unicode or ASCII.  When you see gibberish text then it
-probably means that a meaningless internal encoding is being used.  The
-only way to access the text is to use OCR.  This may be a future
-enhancement.
-
-<a name="fontwidth"></a>
-### What does "java.io.IOException: Can't handle font width" mean? ###
-
-This probably means that the "Resources" directory is not in your classpath. The
-Resources directory is included in the PDFBox jar so this is only a problem if you
-are building PDFBox yourself and not using the binary.
-
-<a name="permission"></a>
-### Why do I get "You do not have permission to extract text" on some documents? ###
-
-PDF documents have certain security permissions that can be applied to them and two 
-passwords associated with them, a user password and a master password. If the "cannot extract text"
-permission bit is set then you need to decrypt the document with the master password in order
-to extract the text.
-
-<a name="partially"></a>
-### Can't we just extract the text without parsing the whole document or extract text as it is parsed? ###
-
-Not really, for a couple reasons.
-
- - If the document is encrypted then you need to parse at least until the encryption dictionary before 
-you can decrypt.
- - Sometimes the PDFont contains vital information needed for text extraction.
- - Text on a page does not have to be drawn in reading order. For example: if the page said "Hello World",
-the pdf could have been written such that "World" gets drawn and then the cursor moves to the left and 
-the word "Hello" is drawn.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/2.0/dependencies.md
----------------------------------------------------------------------
diff --git a/content/2.0/dependencies.md b/content/2.0/dependencies.md
new file mode 100644
index 0000000..3a212d3
--- /dev/null
+++ b/content/2.0/dependencies.md
@@ -0,0 +1,56 @@
+---
+layout: default
+title:  Dependencies
+---
+
+<p class="alert alert-warning">This is an unreleased development preview and may change without notice.</p>
+
+# Dependencies
+
+PDFBox has the following basic dependencies:
+
+- Java 6
+- [commons-logging](http://commons.apache.org/logging/)
+
+Commons Logging is a generic wrapper around different logging frameworks, so you'll either need to also use a logging library like [log4j](http://logging.apache.org/log4j/)
+or let commons-logging fall back to the standard [java.util.logging API](http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html)
+included in the Java platform.
+
+## Optional components
+
+PDFBox does not ship with all features enabled. Third party compoenets are necessary to get full support for certain functionality.
+
+### JAI Image I/O
+
+PDF supports embedded image files, however support for some formats require third party libraries which are distributed under terms incompatible with the Apache 2.0 license:
+
+- Reading **JBIG2** images: [JBIG2 ImageIO](https://github.com/levigo/jbig2-imageio) or [JBIG2-Image-Decoder
+](https://github.com/Borisvl/JBIG2-Image-Decoder)
+- Reading **JPEG 2000 (JPX)** images: [JAI Image I/O Tools Core](https://java.net/projects/jai-imageio-core)
+- Writing **TIFF** images requires *JAI Image I/O Tools Core* also.
+
+These libraries are optional and will be loaded if present on the classpath, otherwise support for these image formats will be disable and a warning will be logged when an unsupported image is encountered.
+
+Maven dependencies for these components can be found in [parent/pom.xml](https://svn.apache.org/viewvc/pdfbox/trunk/parent/pom.xml?view=markup). Please make sure that any third party licenses are suitable for your project.
+
+### Encryption and Signing
+
+Encrypting and sigining PDFs requires the *bcprov* and *bcmail* libraries from the [Legion of the Bouncy Castle](http://www.bouncycastle.org/). These can be included in your Maven project using the following dependencies:
+
+    <dependency>
+        <groupId>org.bouncycastle</groupId>
+        <artifactId>bcprov-jdk15on</artifactId>
+        <version>1.53</version>
+    </dependency>
+    
+    <dependency>
+        <groupId>org.bouncycastle</groupId>
+        <artifactId>bcmail-jdk15on</artifactId>
+        <version>1.53</version>
+    </dependency>
+
+### Java Cryptography Extension (JCE)
+
+256-bit AES encryption requires a JDK with "unlimited strength" cryptography, which requires extra files to be installed. For JDK 7, see [Java Cryptography Extension (JCE)](http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html). If these files are not installed, building PDFBox will throw an exception with the following message:
+
+    JCE unlimited strength jurisdiction policy files are not installed

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/2.0/dependencies.mdtext
----------------------------------------------------------------------
diff --git a/content/2.0/dependencies.mdtext b/content/2.0/dependencies.mdtext
deleted file mode 100644
index 3a212d3..0000000
--- a/content/2.0/dependencies.mdtext
+++ /dev/null
@@ -1,56 +0,0 @@
----
-layout: default
-title:  Dependencies
----
-
-<p class="alert alert-warning">This is an unreleased development preview and may change without notice.</p>
-
-# Dependencies
-
-PDFBox has the following basic dependencies:
-
-- Java 6
-- [commons-logging](http://commons.apache.org/logging/)
-
-Commons Logging is a generic wrapper around different logging frameworks, so you'll either need to also use a logging library like [log4j](http://logging.apache.org/log4j/)
-or let commons-logging fall back to the standard [java.util.logging API](http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html)
-included in the Java platform.
-
-## Optional components
-
-PDFBox does not ship with all features enabled. Third party compoenets are necessary to get full support for certain functionality.
-
-### JAI Image I/O
-
-PDF supports embedded image files, however support for some formats require third party libraries which are distributed under terms incompatible with the Apache 2.0 license:
-
-- Reading **JBIG2** images: [JBIG2 ImageIO](https://github.com/levigo/jbig2-imageio) or [JBIG2-Image-Decoder
-](https://github.com/Borisvl/JBIG2-Image-Decoder)
-- Reading **JPEG 2000 (JPX)** images: [JAI Image I/O Tools Core](https://java.net/projects/jai-imageio-core)
-- Writing **TIFF** images requires *JAI Image I/O Tools Core* also.
-
-These libraries are optional and will be loaded if present on the classpath, otherwise support for these image formats will be disable and a warning will be logged when an unsupported image is encountered.
-
-Maven dependencies for these components can be found in [parent/pom.xml](https://svn.apache.org/viewvc/pdfbox/trunk/parent/pom.xml?view=markup). Please make sure that any third party licenses are suitable for your project.
-
-### Encryption and Signing
-
-Encrypting and sigining PDFs requires the *bcprov* and *bcmail* libraries from the [Legion of the Bouncy Castle](http://www.bouncycastle.org/). These can be included in your Maven project using the following dependencies:
-
-    <dependency>
-        <groupId>org.bouncycastle</groupId>
-        <artifactId>bcprov-jdk15on</artifactId>
-        <version>1.53</version>
-    </dependency>
-    
-    <dependency>
-        <groupId>org.bouncycastle</groupId>
-        <artifactId>bcmail-jdk15on</artifactId>
-        <version>1.53</version>
-    </dependency>
-
-### Java Cryptography Extension (JCE)
-
-256-bit AES encryption requires a JDK with "unlimited strength" cryptography, which requires extra files to be installed. For JDK 7, see [Java Cryptography Extension (JCE)](http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html). If these files are not installed, building PDFBox will throw an exception with the following message:
-
-    JCE unlimited strength jurisdiction policy files are not installed

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/2.0/examples.md
----------------------------------------------------------------------
diff --git a/content/2.0/examples.md b/content/2.0/examples.md
new file mode 100644
index 0000000..cdd8be8
--- /dev/null
+++ b/content/2.0/examples.md
@@ -0,0 +1,9 @@
+---
+layout: default
+title:  Examples
+---
+<p class="alert alert-warning">This is an unreleased development preview and may change without notice.</p>
+
+# Examples
+
+This content is under construction. Please look at our [examples](https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/) directory in SVN.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/2.0/examples.mdtext
----------------------------------------------------------------------
diff --git a/content/2.0/examples.mdtext b/content/2.0/examples.mdtext
deleted file mode 100644
index cdd8be8..0000000
--- a/content/2.0/examples.mdtext
+++ /dev/null
@@ -1,9 +0,0 @@
----
-layout: default
-title:  Examples
----
-<p class="alert alert-warning">This is an unreleased development preview and may change without notice.</p>
-
-# Examples
-
-This content is under construction. Please look at our [examples](https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/) directory in SVN.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/2.0/getting-started.md
----------------------------------------------------------------------
diff --git a/content/2.0/getting-started.md b/content/2.0/getting-started.md
new file mode 100644
index 0000000..a4ecc14
--- /dev/null
+++ b/content/2.0/getting-started.md
@@ -0,0 +1,33 @@
+---
+layout: default
+title:  Getting Started
+---
+
+<p class="alert alert-warning">This is an unreleased development preview and may change without notice.</p>
+
+# Getting Started
+
+This content is under construction.
+
+## Maven
+
+To use the latest 2.0 snapshot release from the SVN trunk, you'll need to add the following dependency:
+
+    <dependency>
+      <groupId>org.apache.pdfbox</groupId>
+      <artifactId>pdfbox</artifactId>
+      <version>2.0.0-SNAPSHOT</version>
+    </dependency>
+
+You'll also need to add the following repository:
+
+    <repository>
+      <id>ApacheSnapshot</id>
+      <name>Apache Repository</name>
+      <url>https://repository.apache.org/content/groups/snapshots/</url>
+      <snapshots>
+        <enabled>true</enabled>
+      </snapshots>
+    </repository>
+
+Please note that this will use the latest **unstable** development snapshot.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/2.0/getting-started.mdtext
----------------------------------------------------------------------
diff --git a/content/2.0/getting-started.mdtext b/content/2.0/getting-started.mdtext
deleted file mode 100644
index a4ecc14..0000000
--- a/content/2.0/getting-started.mdtext
+++ /dev/null
@@ -1,33 +0,0 @@
----
-layout: default
-title:  Getting Started
----
-
-<p class="alert alert-warning">This is an unreleased development preview and may change without notice.</p>
-
-# Getting Started
-
-This content is under construction.
-
-## Maven
-
-To use the latest 2.0 snapshot release from the SVN trunk, you'll need to add the following dependency:
-
-    <dependency>
-      <groupId>org.apache.pdfbox</groupId>
-      <artifactId>pdfbox</artifactId>
-      <version>2.0.0-SNAPSHOT</version>
-    </dependency>
-
-You'll also need to add the following repository:
-
-    <repository>
-      <id>ApacheSnapshot</id>
-      <name>Apache Repository</name>
-      <url>https://repository.apache.org/content/groups/snapshots/</url>
-      <snapshots>
-        <enabled>true</enabled>
-      </snapshots>
-    </repository>
-
-Please note that this will use the latest **unstable** development snapshot.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/building.md
----------------------------------------------------------------------
diff --git a/content/building.md b/content/building.md
new file mode 100644
index 0000000..bf1e4ef
--- /dev/null
+++ b/content/building.md
@@ -0,0 +1,70 @@
+---
+layout: default
+title:  Building PDFBox
+---
+
+# Building from Source
+
+Building PDFBox from source is only necessary if you're wanting to contribute code to the PDFBox project. Most users should use the [binary releases](http://pdfbox.apache.org/download.cgi) instead.
+
+## Obtaining the Source
+
+You can obtain the latest source of PDFBox from our [SVN repo](http://pdfbox.apache.org/download.cgi) The current trunk is v2.0.0-SNAPSHOT. There is a seperate branch for the 1.8.x series. You can fetch the latest 2.0 trunk using Subversion:
+
+    svn checkout http://svn.apache.org/repos/asf/pdfbox/trunk/
+    cd trunk
+
+## Build dependencies
+
+### PDFBox 1.8
+
+- JDK 5 or 6
+-  [Maven 2](http://maven.apache.org/)
+
+### PDFBox 2.0
+
+- JDK 6+
+- Java Cryptography Extension (JCE) [see below]
+-  [Maven 2](http://maven.apache.org/)
+
+### Java Cryptography Extension (JCE)
+
+Building PDFBox 2.0 requires a JDK with "unlimited strength" cryptography, which requires extra files to be installed. For JDK 7, see [Java Cryptography Extension (JCE)](http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html). If these files are not installed, building PDFBox will fail the following test:
+
+    TestPublicKeyEncryption.setUp:70 JCE unlimited strength jurisdiction policy files are not installed
+    
+## Building with Maven
+
+In the root directory of PDFBox:
+
+    mvn clean install
+
+---
+
+## Building with Ant (Deprecated, removed in 2.0.0)
+
+The old Ant build is still available, and can be used especially for
+building .NET binaries with IKVM:
+
+1.  Install [ANT](http://ant.apache.org/). PDFBox currently uses 1.6.2
+    but other versions probably work as well.
+2.  (optional) Setup IKVM, if you want to build the .NET DLL version of
+    PDFBox.
+    1.  [IKVM](http://www.ikvm.net/) binaries
+    2.  In the build.properties, set the ikvm.dir property:\
+         `ikvm.dir=C:\\javalib\\ikvm-12-07-2004\\ikvm`
+
+3.  Run "`ant`" from the root PDFBox directory. This will create the
+    .zip package distribution. See the build file for other ant targets.
+
+NOTE: If you want to run PDFBox from an IDE them you will need to add
+the 'Resources' directory to the project classpath in your IDE.
+
+### Dependencies for Ant Builds
+
+The above instructions expect that you're using [Maven](http://maven.apache.org/) or another build tool like [Ivy](http://ant.apache.org/ivy/) that supports Maven dependencies.
+If you instead use tools like [Ant](http://ant.apache.org/) where you need to explicitly include all the required library jars in your application, you'll need to do
+something different.
+
+The easiest approach is to run ``mvn dependency:copy-dependencies`` inside the pdfbox directory of the latest PDFBox source release. This will copy all the required and optional
+libraries discussed above into the pdfbox/target/dependencies directory. You can then simply copy all the libraries you need from this directory to your application.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/building.mdtext
----------------------------------------------------------------------
diff --git a/content/building.mdtext b/content/building.mdtext
deleted file mode 100644
index bf1e4ef..0000000
--- a/content/building.mdtext
+++ /dev/null
@@ -1,70 +0,0 @@
----
-layout: default
-title:  Building PDFBox
----
-
-# Building from Source
-
-Building PDFBox from source is only necessary if you're wanting to contribute code to the PDFBox project. Most users should use the [binary releases](http://pdfbox.apache.org/download.cgi) instead.
-
-## Obtaining the Source
-
-You can obtain the latest source of PDFBox from our [SVN repo](http://pdfbox.apache.org/download.cgi) The current trunk is v2.0.0-SNAPSHOT. There is a seperate branch for the 1.8.x series. You can fetch the latest 2.0 trunk using Subversion:
-
-    svn checkout http://svn.apache.org/repos/asf/pdfbox/trunk/
-    cd trunk
-
-## Build dependencies
-
-### PDFBox 1.8
-
-- JDK 5 or 6
--  [Maven 2](http://maven.apache.org/)
-
-### PDFBox 2.0
-
-- JDK 6+
-- Java Cryptography Extension (JCE) [see below]
--  [Maven 2](http://maven.apache.org/)
-
-### Java Cryptography Extension (JCE)
-
-Building PDFBox 2.0 requires a JDK with "unlimited strength" cryptography, which requires extra files to be installed. For JDK 7, see [Java Cryptography Extension (JCE)](http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html). If these files are not installed, building PDFBox will fail the following test:
-
-    TestPublicKeyEncryption.setUp:70 JCE unlimited strength jurisdiction policy files are not installed
-    
-## Building with Maven
-
-In the root directory of PDFBox:
-
-    mvn clean install
-
----
-
-## Building with Ant (Deprecated, removed in 2.0.0)
-
-The old Ant build is still available, and can be used especially for
-building .NET binaries with IKVM:
-
-1.  Install [ANT](http://ant.apache.org/). PDFBox currently uses 1.6.2
-    but other versions probably work as well.
-2.  (optional) Setup IKVM, if you want to build the .NET DLL version of
-    PDFBox.
-    1.  [IKVM](http://www.ikvm.net/) binaries
-    2.  In the build.properties, set the ikvm.dir property:\
-         `ikvm.dir=C:\\javalib\\ikvm-12-07-2004\\ikvm`
-
-3.  Run "`ant`" from the root PDFBox directory. This will create the
-    .zip package distribution. See the build file for other ant targets.
-
-NOTE: If you want to run PDFBox from an IDE them you will need to add
-the 'Resources' directory to the project classpath in your IDE.
-
-### Dependencies for Ant Builds
-
-The above instructions expect that you're using [Maven](http://maven.apache.org/) or another build tool like [Ivy](http://ant.apache.org/ivy/) that supports Maven dependencies.
-If you instead use tools like [Ant](http://ant.apache.org/) where you need to explicitly include all the required library jars in your application, you'll need to do
-something different.
-
-The easiest approach is to run ``mvn dependency:copy-dependencies`` inside the pdfbox directory of the latest PDFBox source release. This will copy all the required and optional
-libraries discussed above into the pdfbox/target/dependencies directory. You can then simply copy all the libraries you need from this directory to your application.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/codingconventions.md
----------------------------------------------------------------------
diff --git a/content/codingconventions.md b/content/codingconventions.md
new file mode 100644
index 0000000..280b571
--- /dev/null
+++ b/content/codingconventions.md
@@ -0,0 +1,128 @@
+---
+layout: default
+title:  Coding Conventions
+---
+
+# Coding Conventions
+
+Over the years the PDFBox project has come to adopt a number of coding conventions. These are not always followed in old code but new code should try to follow these rules where possible.
+
+### Formatting
+
+- Braces go on their own line.
+
+- Always use braces with control flow statements.
+
+- No lines longer than 100 characters, including JavaDoc.
+
+- Wrapped lines should use either an indent of 4 or 8 characters or align with the expression at the same level on the previous line.
+
+- Wrapped lines should be broken after operators, not before.
+
+- Prefer aligned wrapped lines.
+
+- Prefer aligned wrapped parameter lists.
+
+### Whitespace
+
+- Four spaces for indents, no tabs.
+
+- Do not use spaces around parenthesis.
+
+- Use spaces after control flow keywords.
+
+- Prefer using blank lines to separate logical blocks of code, but do not be excessive.
+
+- Prefer not following casts with a blank space.
+
+### Structure
+
+- Do not use package imports (e.g. `import java.util.*`)
+
+- Static fields and methods must appear at the top of a class, before any other code.
+
+- Within a class, definitions should be ordered as follows:
+
+    Class (static) variables  
+    Instance variables  
+    Constructors  
+    Methods  
+
+### JavaDoc
+
+- Public and protected methods and fields must have JavaDoc.
+
+- Don't use `@version` tags.
+
+- Don't use `@since` tags.
+
+- Don't include your e-mail address in `@author` tags.
+
+- You may omit `@return` tags for getters as long as you include a summary which begins with the word "Returns".
+
+- Private methods do not require JavaDoc but may have partial JavaDoc if it adds valuable information.
+
+### Comments
+
+- Only use line comments within code, never block comments.
+
+- Prefer comments on their own line, rather than trailing, unless the latter is more readable.
+
+- Prefix line comments by a space `// like this`.
+
+### Variables
+
+- Prefer initializing variables when they are declared, rather than C-style declaration before use.
+
+- Always use final fields when possible.
+
+### Control Flow
+
+- Prefer multiple return statements over additional control flow logic.
+
+- Prefer switch statements over multi-clause if-then statements.
+
+### API Design
+
+- Give variables and methods meaningful names. Keep these short but don't use abbreviations. Prefer using the same terminology as the PDF spec.
+
+- Prefer final classes and final protected methods for non-final public classes, this reduces the surface area of the public API.
+
+- Avoid non-final protected variables in public classes. Prefer protected getters over protected variables when protected fields are necessery in public classes.
+
+- Minimize the API. Don't make everything public just because you can.
+
+- Don't expose implementation details unless there is a clear need: allowing subclassing means that the behaviour of protected methods becomes part of the contract of the public AP.
+
+- Avoid unnecesary abstraction. While you're encouraged to avoid brittle designs, it's unlikey that an API designed for "future use" will have the correct API without any code which actually uses it.
+ 
+### Example
+
+Here's an example of PDFBox's formatting style:
+
+    public class Foo extends Bar
+    {
+        public static void main(String args[])
+        {
+            try
+            {
+                for (int i = 0; i < args.length; i++)
+                {
+                    System.out.println(Integer.parseInt(args[i]));
+                }
+            }
+            catch (NumberFormatException e)
+            {
+                e.printStackTrace();
+            }
+        }
+    }
+
+## Eclipse Formatter
+
+Eclipse users may download this preferences file: pdfbox-eclipse-formatter.xml and import this into Eclipse. 
+(Window->Preferences, go to Java->Code Style->Formatter and click "Import...").
+Once you have done this you can reformat your code by using Source->Format (Ctrl+Shift+F).
+
+Also note that Eclipse will automatically format your import statements appropriately when 
+you invoke Source -> Organize Imports (Ctrl+Shift+O).

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/codingconventions.mdtext
----------------------------------------------------------------------
diff --git a/content/codingconventions.mdtext b/content/codingconventions.mdtext
deleted file mode 100644
index 280b571..0000000
--- a/content/codingconventions.mdtext
+++ /dev/null
@@ -1,128 +0,0 @@
----
-layout: default
-title:  Coding Conventions
----
-
-# Coding Conventions
-
-Over the years the PDFBox project has come to adopt a number of coding conventions. These are not always followed in old code but new code should try to follow these rules where possible.
-
-### Formatting
-
-- Braces go on their own line.
-
-- Always use braces with control flow statements.
-
-- No lines longer than 100 characters, including JavaDoc.
-
-- Wrapped lines should use either an indent of 4 or 8 characters or align with the expression at the same level on the previous line.
-
-- Wrapped lines should be broken after operators, not before.
-
-- Prefer aligned wrapped lines.
-
-- Prefer aligned wrapped parameter lists.
-
-### Whitespace
-
-- Four spaces for indents, no tabs.
-
-- Do not use spaces around parenthesis.
-
-- Use spaces after control flow keywords.
-
-- Prefer using blank lines to separate logical blocks of code, but do not be excessive.
-
-- Prefer not following casts with a blank space.
-
-### Structure
-
-- Do not use package imports (e.g. `import java.util.*`)
-
-- Static fields and methods must appear at the top of a class, before any other code.
-
-- Within a class, definitions should be ordered as follows:
-
-    Class (static) variables  
-    Instance variables  
-    Constructors  
-    Methods  
-
-### JavaDoc
-
-- Public and protected methods and fields must have JavaDoc.
-
-- Don't use `@version` tags.
-
-- Don't use `@since` tags.
-
-- Don't include your e-mail address in `@author` tags.
-
-- You may omit `@return` tags for getters as long as you include a summary which begins with the word "Returns".
-
-- Private methods do not require JavaDoc but may have partial JavaDoc if it adds valuable information.
-
-### Comments
-
-- Only use line comments within code, never block comments.
-
-- Prefer comments on their own line, rather than trailing, unless the latter is more readable.
-
-- Prefix line comments by a space `// like this`.
-
-### Variables
-
-- Prefer initializing variables when they are declared, rather than C-style declaration before use.
-
-- Always use final fields when possible.
-
-### Control Flow
-
-- Prefer multiple return statements over additional control flow logic.
-
-- Prefer switch statements over multi-clause if-then statements.
-
-### API Design
-
-- Give variables and methods meaningful names. Keep these short but don't use abbreviations. Prefer using the same terminology as the PDF spec.
-
-- Prefer final classes and final protected methods for non-final public classes, this reduces the surface area of the public API.
-
-- Avoid non-final protected variables in public classes. Prefer protected getters over protected variables when protected fields are necessery in public classes.
-
-- Minimize the API. Don't make everything public just because you can.
-
-- Don't expose implementation details unless there is a clear need: allowing subclassing means that the behaviour of protected methods becomes part of the contract of the public AP.
-
-- Avoid unnecesary abstraction. While you're encouraged to avoid brittle designs, it's unlikey that an API designed for "future use" will have the correct API without any code which actually uses it.
- 
-### Example
-
-Here's an example of PDFBox's formatting style:
-
-    public class Foo extends Bar
-    {
-        public static void main(String args[])
-        {
-            try
-            {
-                for (int i = 0; i < args.length; i++)
-                {
-                    System.out.println(Integer.parseInt(args[i]));
-                }
-            }
-            catch (NumberFormatException e)
-            {
-                e.printStackTrace();
-            }
-        }
-    }
-
-## Eclipse Formatter
-
-Eclipse users may download this preferences file: pdfbox-eclipse-formatter.xml and import this into Eclipse. 
-(Window->Preferences, go to Java->Code Style->Formatter and click "Import...").
-Once you have done this you can reformat your code by using Source->Format (Ctrl+Shift+F).
-
-Also note that Eclipse will automatically format your import statements appropriately when 
-you invoke Source -> Organize Imports (Ctrl+Shift+O).

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/errors/403.md
----------------------------------------------------------------------
diff --git a/content/errors/403.md b/content/errors/403.md
new file mode 100644
index 0000000..888f7b8
--- /dev/null
+++ b/content/errors/403.md
@@ -0,0 +1,15 @@
+---
+layout: default
+title:  Forbidden (403)
+---
+# 403
+
+We're sorry, but the page you requested cannot be accessed. 
+
+Maybe you 
+
+* typed the address incorrectly
+* followed a link from another site that pointed to this page.
+
+
+If you came by following a broken link, please report the [issue](https://issues.apache.org/jira/browse/pdfbox).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/errors/403.mdtext
----------------------------------------------------------------------
diff --git a/content/errors/403.mdtext b/content/errors/403.mdtext
deleted file mode 100644
index 888f7b8..0000000
--- a/content/errors/403.mdtext
+++ /dev/null
@@ -1,15 +0,0 @@
----
-layout: default
-title:  Forbidden (403)
----
-# 403
-
-We're sorry, but the page you requested cannot be accessed. 
-
-Maybe you 
-
-* typed the address incorrectly
-* followed a link from another site that pointed to this page.
-
-
-If you came by following a broken link, please report the [issue](https://issues.apache.org/jira/browse/pdfbox).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/errors/404.md
----------------------------------------------------------------------
diff --git a/content/errors/404.md b/content/errors/404.md
new file mode 100644
index 0000000..e83602d
--- /dev/null
+++ b/content/errors/404.md
@@ -0,0 +1,15 @@
+---
+layout: default
+title:  Page Not Found
+---
+# 404
+
+We're sorry, but the page you requested cannot be found. 
+
+Maybe you 
+
+* typed the address incorrectly
+* followed a link from another site that pointed to this page.
+
+
+If you came by following a broken link, please report the [issue](https://issues.apache.org/jira/browse/pdfbox).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/errors/404.mdtext
----------------------------------------------------------------------
diff --git a/content/errors/404.mdtext b/content/errors/404.mdtext
deleted file mode 100644
index e83602d..0000000
--- a/content/errors/404.mdtext
+++ /dev/null
@@ -1,15 +0,0 @@
----
-layout: default
-title:  Page Not Found
----
-# 404
-
-We're sorry, but the page you requested cannot be found. 
-
-Maybe you 
-
-* typed the address incorrectly
-* followed a link from another site that pointed to this page.
-
-
-If you came by following a broken link, please report the [issue](https://issues.apache.org/jira/browse/pdfbox).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/ideas.md
----------------------------------------------------------------------
diff --git a/content/ideas.md b/content/ideas.md
new file mode 100644
index 0000000..3090f23
--- /dev/null
+++ b/content/ideas.md
@@ -0,0 +1,88 @@
+---
+layout: default
+title:  Ideas
+---
+
+# Ideas
+
+There are several ideas to enhance PDFBox. These are outlined below together with 
+comments and the releases they are planned for as soon as there is agreement to do the
+implementation.
+
+## Enhance type safety
+
+Enhance the type safety of PDFBox and add more generic collections and code cleanup.
+
+## Remove all deprecated methods
+
+This is an ongoing effort and most/all deprecated methods will be removed in PDFBox 2.0.0
+
+## Handle large PDF files
+
+In addition to the PDF parsing pdfbox does not always handle large PDF files well as some 
+of the references are implemented as int instead of long
+
+
+## <span class="complete">Switch to Java 1.6</span>
+
+<span class="complete">PDFBox 2.0.0 has Java 6 as minimum requirement.</span>
+
+## <span class="complete">Break PDFBox into modules</span>
+
+<span class="complete">In order to support different use cases and provide a minimal toolset PDFBox 2.0.0 should be 
+separated into different modules. This goes inline with rearranging some of the code
+e.g. remove AWT from PDDocument.
+</span>
+
+## <span class="complete">Enhance the font rendering</span>
+
+<span class="complete">PDFBox 2.0.0 will render most of the fonts without using AWT.</span>
+ 
+## Replace/enhance PDF parsing
+
+<span class="complete">The old "classic" PDF parser in PDFBox is not in line with the PDF specification as it parses
+a PDF from top to bottom instead of respecting the XRef information.</span> The NonSequentialParser
+enhanced that situation but there is a need to have a cleaner foundation broken into several levels
+
+- io
+- tokenization
+- parsing according to structure
+- COS level document
+- PD level document
+- add some self healing mechanism to process corrupt files
+
+In addition handling documents which are not conforming shouldn't be part of the core parser
+but of a extentable approach e.g. by adding hooks to allow for handling parsing exceptions.
+
+## <span class="complete">Add the ability to create PDFs using unicode encoded text</span>
+
+<span class="complete">The recent PDFBox version is limited to WinANSI encoded text. 2.0.0 should have unicode support as well.</span>
+
+## Rearchitect the COS level objects
+
+The COS level objects need to be refactored to be in line with the new parser. In addition
+method signatures, constructing ... should be made similar across the COS objects
+
+## Parsing on demand
+
+Instead of always parsing the complete document PDFs should be parsable on demand making
+objects only available as they are needed to enhance performance and minimize memory footprint.
+
+This might be achieved by providing a layered approach where a base (non caching) parser provides
+the on demand parsing and a caching parser built on top caches objects for use cases where
+this is beneficial e.g. rendering, debugging ...
+
+- the lexer would be the low level component delivering tokens to the parser.
+  A sample implementation exists as part of PDFBOX-1000. The benefit would be a clean low
+  level handling of tokens. The current implementation needs to be (slightly ?) revised though
+- the incremental (non caching) parser would allow for page by page processing moving forward 
+  only to support text extraction, merging, splitting … - the benefit would be a lower memory 
+  consumption as well as a potential faster processing
+- the caching parser would support applications such a PDFDebugger or PDFReader 
+
+## Handling of PDF versions
+The current implementation is a mix of PDF 1.4 and some adhoc additions without a clear 
+distinction what is and is not supported. We could ad some support for explicitly handling
+versions in PDFBox e.g. my marking certain methods and properties to the PDF version support
+level. This could in addition be a good basis for PDF/A and other compliance checks. 
+

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/ideas.mdtext
----------------------------------------------------------------------
diff --git a/content/ideas.mdtext b/content/ideas.mdtext
deleted file mode 100644
index 3090f23..0000000
--- a/content/ideas.mdtext
+++ /dev/null
@@ -1,88 +0,0 @@
----
-layout: default
-title:  Ideas
----
-
-# Ideas
-
-There are several ideas to enhance PDFBox. These are outlined below together with 
-comments and the releases they are planned for as soon as there is agreement to do the
-implementation.
-
-## Enhance type safety
-
-Enhance the type safety of PDFBox and add more generic collections and code cleanup.
-
-## Remove all deprecated methods
-
-This is an ongoing effort and most/all deprecated methods will be removed in PDFBox 2.0.0
-
-## Handle large PDF files
-
-In addition to the PDF parsing pdfbox does not always handle large PDF files well as some 
-of the references are implemented as int instead of long
-
-
-## <span class="complete">Switch to Java 1.6</span>
-
-<span class="complete">PDFBox 2.0.0 has Java 6 as minimum requirement.</span>
-
-## <span class="complete">Break PDFBox into modules</span>
-
-<span class="complete">In order to support different use cases and provide a minimal toolset PDFBox 2.0.0 should be 
-separated into different modules. This goes inline with rearranging some of the code
-e.g. remove AWT from PDDocument.
-</span>
-
-## <span class="complete">Enhance the font rendering</span>
-
-<span class="complete">PDFBox 2.0.0 will render most of the fonts without using AWT.</span>
- 
-## Replace/enhance PDF parsing
-
-<span class="complete">The old "classic" PDF parser in PDFBox is not in line with the PDF specification as it parses
-a PDF from top to bottom instead of respecting the XRef information.</span> The NonSequentialParser
-enhanced that situation but there is a need to have a cleaner foundation broken into several levels
-
-- io
-- tokenization
-- parsing according to structure
-- COS level document
-- PD level document
-- add some self healing mechanism to process corrupt files
-
-In addition handling documents which are not conforming shouldn't be part of the core parser
-but of a extentable approach e.g. by adding hooks to allow for handling parsing exceptions.
-
-## <span class="complete">Add the ability to create PDFs using unicode encoded text</span>
-
-<span class="complete">The recent PDFBox version is limited to WinANSI encoded text. 2.0.0 should have unicode support as well.</span>
-
-## Rearchitect the COS level objects
-
-The COS level objects need to be refactored to be in line with the new parser. In addition
-method signatures, constructing ... should be made similar across the COS objects
-
-## Parsing on demand
-
-Instead of always parsing the complete document PDFs should be parsable on demand making
-objects only available as they are needed to enhance performance and minimize memory footprint.
-
-This might be achieved by providing a layered approach where a base (non caching) parser provides
-the on demand parsing and a caching parser built on top caches objects for use cases where
-this is beneficial e.g. rendering, debugging ...
-
-- the lexer would be the low level component delivering tokens to the parser.
-  A sample implementation exists as part of PDFBOX-1000. The benefit would be a clean low
-  level handling of tokens. The current implementation needs to be (slightly ?) revised though
-- the incremental (non caching) parser would allow for page by page processing moving forward 
-  only to support text extraction, merging, splitting … - the benefit would be a lower memory 
-  consumption as well as a potential faster processing
-- the caching parser would support applications such a PDFDebugger or PDFReader 
-
-## Handling of PDF versions
-The current implementation is a mix of PDF 1.4 and some adhoc additions without a clear 
-distinction what is and is not supported. We could ad some support for explicitly handling
-versions in PDFBox e.g. my marking certain methods and properties to the PDF version support
-level. This could in addition be a good basis for PDF/A and other compliance checks. 
-

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/index.md
----------------------------------------------------------------------
diff --git a/content/index.md b/content/index.md
new file mode 100644
index 0000000..d3eec51
--- /dev/null
+++ b/content/index.md
@@ -0,0 +1,65 @@
+---
+layout: default
+title:  A Java PDF Library
+---
+# Apache PDFBox - A Java PDF Library
+
+<p class="lead">The Apache PDFBox™ library is an open source Java tool for working with
+    PDF documents. This project allows creation of new PDF documents, manipulation of existing
+    documents and the ability to extract content from documents.
+
+    Apache PDFBox also includes several command line utilities.
+    Apache PDFBox is published under the Apache License v2.0.</p>
+    
+## News
+With the initial discussions starting 3 years ago PDFBox 2.0.0 is in the works for quite some time now - **and we are in the final stages!** To give you the opportunity to provide feedback a [PDFBox 2.0.0-RC1 Release Candidate](http://pdfbox.apache.org/download.cgi) is now available. The [Migration Guide](http://pdfbox.apache.org/2.0/migration.html) shall give users coming from PDFBox 1.8 or earlier an overview about things to look at when switching over. More details to come.
+
+## Getting Help ##
+
+To get help on using PDFBox, please [Subscribe to the Users Mailing List](mailto:users-subscribe@pdfbox.apache.org) and post your
+questions there. We're happy to help.
+
+The project is a volunteer effort and we're always looking for interested people to help
+us improve PDFBox. There are a multitude of ways that you can help us depending on your
+skills. Subscribe to the [Mailing Lists](/mailinglists.html) and find out how you can help.
+
+<h2 id="features">Features</h2>
+
+<div class="row">
+    <div class="col-md-3">
+        <header><h4><span class="oi oi-box"></span>Extract Text</h4></header>
+        <p>Extract Unicode text from PDF files.</p>
+    </div>
+    <div class="col-md-3">
+        <header><h4><span class="oi oi-box"></span>Split &amp; Merge</h4></header>
+        <p>Split a single PDF into many files or merge multiple PDF files.</p>
+    </div>
+    <div class="col-md-3">
+        <header><h4><span class="oi oi-box"></span>Fill Forms</h4></header>
+        <p>Extract data from PDF forms or fill a PDF form.</p>
+    </div>
+    <div class="col-md-3">
+        <header><h4><span class="oi oi-box"></span>Preflight</h4></header>
+        <p>Validate PDF files against the PDF/A-1b standard.</p>
+    </div>
+</div>
+
+<div class="row">
+    <div class="col-md-3">
+        <header><h4><span class="oi oi-box"></span>Print</h4></header>
+        <p>Print a PDF file using the standard Java printing API.</p>
+    </div>
+    <div class="col-md-3">
+        <header><h4><span class="oi oi-box"></span>Save as Image</h4></header>
+        <p>Save PDFs as image files, such as PNG or JPEG.</p>
+    </div>
+    <div class="col-md-3">
+        <header><h4><span class="oi oi-box"></span>Create PDFs</h4></header>
+        <p>Create a PDF from scratch, with embedded fonts and images.</p>
+    </div>
+    <div class="col-md-3">
+        <header><h4><span class="oi oi-box"></span>Signing</h4></header>
+        <p>Digitally sign PDF files.</p>
+    </div>
+</div>
+

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/index.mdtext
----------------------------------------------------------------------
diff --git a/content/index.mdtext b/content/index.mdtext
deleted file mode 100644
index d3eec51..0000000
--- a/content/index.mdtext
+++ /dev/null
@@ -1,65 +0,0 @@
----
-layout: default
-title:  A Java PDF Library
----
-# Apache PDFBox - A Java PDF Library
-
-<p class="lead">The Apache PDFBox™ library is an open source Java tool for working with
-    PDF documents. This project allows creation of new PDF documents, manipulation of existing
-    documents and the ability to extract content from documents.
-
-    Apache PDFBox also includes several command line utilities.
-    Apache PDFBox is published under the Apache License v2.0.</p>
-    
-## News
-With the initial discussions starting 3 years ago PDFBox 2.0.0 is in the works for quite some time now - **and we are in the final stages!** To give you the opportunity to provide feedback a [PDFBox 2.0.0-RC1 Release Candidate](http://pdfbox.apache.org/download.cgi) is now available. The [Migration Guide](http://pdfbox.apache.org/2.0/migration.html) shall give users coming from PDFBox 1.8 or earlier an overview about things to look at when switching over. More details to come.
-
-## Getting Help ##
-
-To get help on using PDFBox, please [Subscribe to the Users Mailing List](mailto:users-subscribe@pdfbox.apache.org) and post your
-questions there. We're happy to help.
-
-The project is a volunteer effort and we're always looking for interested people to help
-us improve PDFBox. There are a multitude of ways that you can help us depending on your
-skills. Subscribe to the [Mailing Lists](/mailinglists.html) and find out how you can help.
-
-<h2 id="features">Features</h2>
-
-<div class="row">
-    <div class="col-md-3">
-        <header><h4><span class="oi oi-box"></span>Extract Text</h4></header>
-        <p>Extract Unicode text from PDF files.</p>
-    </div>
-    <div class="col-md-3">
-        <header><h4><span class="oi oi-box"></span>Split &amp; Merge</h4></header>
-        <p>Split a single PDF into many files or merge multiple PDF files.</p>
-    </div>
-    <div class="col-md-3">
-        <header><h4><span class="oi oi-box"></span>Fill Forms</h4></header>
-        <p>Extract data from PDF forms or fill a PDF form.</p>
-    </div>
-    <div class="col-md-3">
-        <header><h4><span class="oi oi-box"></span>Preflight</h4></header>
-        <p>Validate PDF files against the PDF/A-1b standard.</p>
-    </div>
-</div>
-
-<div class="row">
-    <div class="col-md-3">
-        <header><h4><span class="oi oi-box"></span>Print</h4></header>
-        <p>Print a PDF file using the standard Java printing API.</p>
-    </div>
-    <div class="col-md-3">
-        <header><h4><span class="oi oi-box"></span>Save as Image</h4></header>
-        <p>Save PDFs as image files, such as PNG or JPEG.</p>
-    </div>
-    <div class="col-md-3">
-        <header><h4><span class="oi oi-box"></span>Create PDFs</h4></header>
-        <p>Create a PDF from scratch, with embedded fonts and images.</p>
-    </div>
-    <div class="col-md-3">
-        <header><h4><span class="oi oi-box"></span>Signing</h4></header>
-        <p>Digitally sign PDF files.</p>
-    </div>
-</div>
-

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/mailinglists.md
----------------------------------------------------------------------
diff --git a/content/mailinglists.md b/content/mailinglists.md
new file mode 100644
index 0000000..4cbb80b
--- /dev/null
+++ b/content/mailinglists.md
@@ -0,0 +1,28 @@
+---
+layout: default
+title:  Mailing Lists
+---
+
+# Mailing Lists
+
+Mailing Lists are the primary communication channels for all projects at 
+The Apache Software Foundation. Therefore, this applies to Apache PDFBox, too. 
+
+**Please read the [public forum archive policy](http://www.apache.org/foundation/public-archives.html) carefully before subscribing to one of our list.**
+
+If you have any questions about or problems with Apache PDFBox, you can get them addressed 
+on the **Users Mailing List**. 
+
+If you like to participate in the development of Apache PDFBox, 
+the **Developers Mailing List** is the place to be. 
+
+If you like to keep track of what's being changed inside the project, you can subscribe 
+to the **Commit Mailing List**.
+
+<p class="alert alert-info">Please use the Users Mailing List if you are unsure which list to use</p>
+
+| Name | Address | Subscribe | Unsubscribe | Help | Archive | MarkMail |
+| --- | --- | --- | ---| ---| --- | --- |
+| Users | users@pdfbox.apache.org | [Subscribe](mailto:users-subscribe@pdfbox.apache.org) | [Unsubscribe](mailto:users-unsubscribe@pdfbox.apache.org) | [Help](mailto:users-help@pdfbox.apache.org) | [Archive](http://mail-archives.apache.org/mod_mbox/pdfbox-users/) | [MarkMail](http://pdfbox-users.markmail.org/) |
+| Developers | dev@pdfbox.apache.org | [Subscribe](mailto:dev-subscribe@pdfbox.apache.org) | [Unsubscribe](mailto:dev-unsubscribe@pdfbox.apache.org) | [Help](mailto:dev-help@pdfbox.apache.org) | [Archive](http://mail-archives.apache.org/mod_mbox/pdfbox-dev/) | [MarkMail](http://pdfbox-dev.markmail.org/) |	 
+| Commits List | commits@pdfbox.apache.org | [Subscribe](mailto:commits-subscribe@pdfbox.apache.org) | [Unsubscribe](mailto:commits-unsubscribe@pdfbox.apache.org) | [Help](mailto:commits-help@pdfbox.apache.org) | [Archive](http://mail-archives.apache.org/mod_mbox/pdfbox-commits/) | [MarkMail](http://pdfbox-commits.markmail.org/) |	 

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/mailinglists.mdtext
----------------------------------------------------------------------
diff --git a/content/mailinglists.mdtext b/content/mailinglists.mdtext
deleted file mode 100644
index 4cbb80b..0000000
--- a/content/mailinglists.mdtext
+++ /dev/null
@@ -1,28 +0,0 @@
----
-layout: default
-title:  Mailing Lists
----
-
-# Mailing Lists
-
-Mailing Lists are the primary communication channels for all projects at 
-The Apache Software Foundation. Therefore, this applies to Apache PDFBox, too. 
-
-**Please read the [public forum archive policy](http://www.apache.org/foundation/public-archives.html) carefully before subscribing to one of our list.**
-
-If you have any questions about or problems with Apache PDFBox, you can get them addressed 
-on the **Users Mailing List**. 
-
-If you like to participate in the development of Apache PDFBox, 
-the **Developers Mailing List** is the place to be. 
-
-If you like to keep track of what's being changed inside the project, you can subscribe 
-to the **Commit Mailing List**.
-
-<p class="alert alert-info">Please use the Users Mailing List if you are unsure which list to use</p>
-
-| Name | Address | Subscribe | Unsubscribe | Help | Archive | MarkMail |
-| --- | --- | --- | ---| ---| --- | --- |
-| Users | users@pdfbox.apache.org | [Subscribe](mailto:users-subscribe@pdfbox.apache.org) | [Unsubscribe](mailto:users-unsubscribe@pdfbox.apache.org) | [Help](mailto:users-help@pdfbox.apache.org) | [Archive](http://mail-archives.apache.org/mod_mbox/pdfbox-users/) | [MarkMail](http://pdfbox-users.markmail.org/) |
-| Developers | dev@pdfbox.apache.org | [Subscribe](mailto:dev-subscribe@pdfbox.apache.org) | [Unsubscribe](mailto:dev-unsubscribe@pdfbox.apache.org) | [Help](mailto:dev-help@pdfbox.apache.org) | [Archive](http://mail-archives.apache.org/mod_mbox/pdfbox-dev/) | [MarkMail](http://pdfbox-dev.markmail.org/) |	 
-| Commits List | commits@pdfbox.apache.org | [Subscribe](mailto:commits-subscribe@pdfbox.apache.org) | [Unsubscribe](mailto:commits-unsubscribe@pdfbox.apache.org) | [Help](mailto:commits-help@pdfbox.apache.org) | [Archive](http://mail-archives.apache.org/mod_mbox/pdfbox-commits/) | [MarkMail](http://pdfbox-commits.markmail.org/) |	 

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/references.md
----------------------------------------------------------------------
diff --git a/content/references.md b/content/references.md
new file mode 100644
index 0000000..1805246
--- /dev/null
+++ b/content/references.md
@@ -0,0 +1,48 @@
+---
+layout: default
+title:  External Links
+---
+
+# External Links
+
+This page lists projects that utilize PDFBox and articles that have been written about PDFBox. 
+Please file an [improvement issue](https://issues.apache.org/jira/browse/PDFBOX) to get new projects or articles added to this page, or to update the information on existing links.
+
+## Projects
+
+| Project Name | License | Project Description |
+| --- | --- | --- |
+| [Alfresco](http://www.alfresco.org/) | LGPL - commercial services/support/training is available | Alfresco is an open source, open-standards content repository built by the most experienced content management team that includes the co-founder of Documentum.|
+| [Apache Nutch](http://nutch.apache.org/) | Apache License V2.0 | Apache Nutch is open source web-search software. It builds on Apache Lucene, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.|
+| [Apache Tika](http://tika.apache.org/) | Apache License V2.0 | Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.|
+| [Centric CRM](http://www.centriccrm.com/) | Free To Use But Restricted/Commercial | The Most Advanced Open Source CRM Software.|
+| [Canoo Webtest](http://webtest.canoo.com/webtest/manual/WebTestHome.html) | BSD Like | Free OpenSource tool for XP-style acceptance testing of Java-based Web applications.|
+| [contineo](http://webtest.canoo.com/webtest/manual/WebTestHome.html) | GPL | Contineo is a web based document management system.|
+| [ECM REWOO Scope](http://www.rewoo.de/) | Commercial | REWOO Scope is an Enterprise Content Management (ECM) software to organize, structure and consolidate enterprise data. Apache PDFBox is an integral part to read and index PDF documents.|
+| [Jahia](http://www.jahia.org/) | collaborative source license | The Jahia product is currently the most powerful, ready-to-use and affordable integrated midrange Java Content Management and Corporate Portal Server.|
+| [jLibrary](http://jlibrary.sourceforge.net/) | BSD | jLibrary is a Document Management System, oriented for personal and enterprise use.|
+| [Jomic](http://jomic.sourceforge.net/) | GPL | Jomic is a viewer for comic book archives.|
+| [JpdfUnit](http://jpdfunit.sourceforge.net/) | Apache License V2.0 | pdfUnit is a framework for testing a generated pdf document with the JUnit Test Framework.|
+| [Liferay Portal](http://www.liferay.com/) | MIT | Liferay Portal is an open source portal that helps organizations collaborate more efficiently by providing a consolidated view of disparate applications.|
+| [LIUS](http://www.bibl.ulaval.ca/lius/index.en.html) | GPL | LIUS is an indexing Java framework based on the Jakarta Lucene project. The LIUS framework adds to Lucene many files format indexing fonctionalities as: Ms World, Ms Excel, Ms PowerPoint, RTF, PDF, XML, HTML, TXT, Open Office suite and JavaBeans.|
+| [LuceGene](http://gmod.org/wiki/LuceGene) | Artistic License | LuceGene is an open-source document/object search and retrieval system specially tuned for bioinformatics text databases and documents.|
+| [Lutece](http://www.lutece.paris.fr/) | BSD-like | Lutece is a portal engine which allows you to easily create your websites or intranets based upon HTML,XML content.|
+| [MMBase Lucene Module](http://mmapps.sourceforge.net/lucenemodule/) | MPL | Lucenemodule is a plugin (module) for the MMBase content management system that enables Lucene full text search through it's content, and thanks to PDFBox also PDF content.|
+| [OpenCms](http://www.opencms.org/) | Custom | OpenCms is a professional level Open Source Website Content Management System.|
+| [OpenSearchServer](http://www.open-search-server.com/) | GPLv3 | An open source search engine and crawler based on best open source technologies. It is a modern search engine and a suite of high-powered full text search algorithms.|
+| [Orbeon PresentationServer](http://forge.objectweb.org/projects/ops) | LGPL | Orbeon PresentationServer (OPS) is an open source J2EE-based platform for XML-centric web applications. OPS is built around XHTML, XForms, XSLT, XML pipelines, and Web Services, which makes it ideal for applications that capture, process and present XML data. Commercial consulting/training/support is available through orbeon.|
+| [PDFcat](http://pdfcat.sourceforge.net/) | LGPL | PDFcat is multi-platform catalog manager that provides searching capability over documents among virtual catalogs.|
+| [SearchBlox](http://www.searchblox.com/) | Commercial | SearchBlox is a high-performance corporate search software designed for the Java 2 Enterprise Edition (J2EE) platform.|
+| [SimplexRepaginator](http://www.simplexrepaginator.com/) | Apache License V2.0 | Simplex Repaginator converts simplex-scanned PDFs into properly duplex-paginated PDFs and vice versa. |
+| [Terrier](http://ir.dcs.gla.ac.uk/terrier/) | MPL | Terrier is software for the rapid development of Web, intranet and desktop search engines.|
+| [Triboni GinkGO](http://www.triboni.com/) | Commercial | Triboni GinkGO is a highly scalable J2EE services platform that is based on a simple XML business object defintion and scripting language. Toghether with XSLT content centric web applications can be configured in a very short time.|
+| [Zilverline](http://www.zilverline.org/) | Collaborative Source License | Zilverline is a search engine that offers web access to your personal or intranet content.|
+
+## Articles/Books
+
+| Article Name | Article Abstract|
+| --- | --- |
+| Build an eDoc Reader for your iPod <br/> [Part 1 - User Interface](http://www.oreillynet.com/pub/a/mac/2004/12/14/ipod_reader.html) <br/> [Part 2 - Document Reading Engine](http://www.oreillynet.com/pub/a/mac/2004/12/17/ipod_reader.html) <br/> [Part 3 - *Integration with PDFBox*](http://www.oreillynet.com/pub/a/mac/2005/01/07/ipod_reader.html) | A three part article that discusses the implementation of the PodReader application. PodReader is Cocoa application written in Objective-C and article discusses how to use the Cocoa-Java bridge to integrate with the Java version of PDFBox.|
+| [Lucene In Action](http://www.manning.com/hatcher2/) | A book that discusses integrating with the lucene search engine. One chapter discusses how to index various file formats and highlights PDFBox for indexing PDF documents.|
+| [Java Developers Journal - March 2005](http://java.sys-con.com/node/48543) | An article written by the lead developer of PDFBox discussing text extraction and AcroForm integration using PDFBox functionality.|
+| [Refactoring trends across N versions of N Java open source systems: an empirical study](http://www.dcs.bbk.ac.uk/research/techreps/2005/bbkcs-05-02.pdf) | This article describes an empirical study of multiple versions of a range of open source Java systems in an attempt to understand whether refactoring occur and, if so, which types of refactoring were most (and least) common. PDFBox is used as a case study. |
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/references.mdtext
----------------------------------------------------------------------
diff --git a/content/references.mdtext b/content/references.mdtext
deleted file mode 100644
index 1805246..0000000
--- a/content/references.mdtext
+++ /dev/null
@@ -1,48 +0,0 @@
----
-layout: default
-title:  External Links
----
-
-# External Links
-
-This page lists projects that utilize PDFBox and articles that have been written about PDFBox. 
-Please file an [improvement issue](https://issues.apache.org/jira/browse/PDFBOX) to get new projects or articles added to this page, or to update the information on existing links.
-
-## Projects
-
-| Project Name | License | Project Description |
-| --- | --- | --- |
-| [Alfresco](http://www.alfresco.org/) | LGPL - commercial services/support/training is available | Alfresco is an open source, open-standards content repository built by the most experienced content management team that includes the co-founder of Documentum.|
-| [Apache Nutch](http://nutch.apache.org/) | Apache License V2.0 | Apache Nutch is open source web-search software. It builds on Apache Lucene, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.|
-| [Apache Tika](http://tika.apache.org/) | Apache License V2.0 | Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.|
-| [Centric CRM](http://www.centriccrm.com/) | Free To Use But Restricted/Commercial | The Most Advanced Open Source CRM Software.|
-| [Canoo Webtest](http://webtest.canoo.com/webtest/manual/WebTestHome.html) | BSD Like | Free OpenSource tool for XP-style acceptance testing of Java-based Web applications.|
-| [contineo](http://webtest.canoo.com/webtest/manual/WebTestHome.html) | GPL | Contineo is a web based document management system.|
-| [ECM REWOO Scope](http://www.rewoo.de/) | Commercial | REWOO Scope is an Enterprise Content Management (ECM) software to organize, structure and consolidate enterprise data. Apache PDFBox is an integral part to read and index PDF documents.|
-| [Jahia](http://www.jahia.org/) | collaborative source license | The Jahia product is currently the most powerful, ready-to-use and affordable integrated midrange Java Content Management and Corporate Portal Server.|
-| [jLibrary](http://jlibrary.sourceforge.net/) | BSD | jLibrary is a Document Management System, oriented for personal and enterprise use.|
-| [Jomic](http://jomic.sourceforge.net/) | GPL | Jomic is a viewer for comic book archives.|
-| [JpdfUnit](http://jpdfunit.sourceforge.net/) | Apache License V2.0 | pdfUnit is a framework for testing a generated pdf document with the JUnit Test Framework.|
-| [Liferay Portal](http://www.liferay.com/) | MIT | Liferay Portal is an open source portal that helps organizations collaborate more efficiently by providing a consolidated view of disparate applications.|
-| [LIUS](http://www.bibl.ulaval.ca/lius/index.en.html) | GPL | LIUS is an indexing Java framework based on the Jakarta Lucene project. The LIUS framework adds to Lucene many files format indexing fonctionalities as: Ms World, Ms Excel, Ms PowerPoint, RTF, PDF, XML, HTML, TXT, Open Office suite and JavaBeans.|
-| [LuceGene](http://gmod.org/wiki/LuceGene) | Artistic License | LuceGene is an open-source document/object search and retrieval system specially tuned for bioinformatics text databases and documents.|
-| [Lutece](http://www.lutece.paris.fr/) | BSD-like | Lutece is a portal engine which allows you to easily create your websites or intranets based upon HTML,XML content.|
-| [MMBase Lucene Module](http://mmapps.sourceforge.net/lucenemodule/) | MPL | Lucenemodule is a plugin (module) for the MMBase content management system that enables Lucene full text search through it's content, and thanks to PDFBox also PDF content.|
-| [OpenCms](http://www.opencms.org/) | Custom | OpenCms is a professional level Open Source Website Content Management System.|
-| [OpenSearchServer](http://www.open-search-server.com/) | GPLv3 | An open source search engine and crawler based on best open source technologies. It is a modern search engine and a suite of high-powered full text search algorithms.|
-| [Orbeon PresentationServer](http://forge.objectweb.org/projects/ops) | LGPL | Orbeon PresentationServer (OPS) is an open source J2EE-based platform for XML-centric web applications. OPS is built around XHTML, XForms, XSLT, XML pipelines, and Web Services, which makes it ideal for applications that capture, process and present XML data. Commercial consulting/training/support is available through orbeon.|
-| [PDFcat](http://pdfcat.sourceforge.net/) | LGPL | PDFcat is multi-platform catalog manager that provides searching capability over documents among virtual catalogs.|
-| [SearchBlox](http://www.searchblox.com/) | Commercial | SearchBlox is a high-performance corporate search software designed for the Java 2 Enterprise Edition (J2EE) platform.|
-| [SimplexRepaginator](http://www.simplexrepaginator.com/) | Apache License V2.0 | Simplex Repaginator converts simplex-scanned PDFs into properly duplex-paginated PDFs and vice versa. |
-| [Terrier](http://ir.dcs.gla.ac.uk/terrier/) | MPL | Terrier is software for the rapid development of Web, intranet and desktop search engines.|
-| [Triboni GinkGO](http://www.triboni.com/) | Commercial | Triboni GinkGO is a highly scalable J2EE services platform that is based on a simple XML business object defintion and scripting language. Toghether with XSLT content centric web applications can be configured in a very short time.|
-| [Zilverline](http://www.zilverline.org/) | Collaborative Source License | Zilverline is a search engine that offers web access to your personal or intranet content.|
-
-## Articles/Books
-
-| Article Name | Article Abstract|
-| --- | --- |
-| Build an eDoc Reader for your iPod <br/> [Part 1 - User Interface](http://www.oreillynet.com/pub/a/mac/2004/12/14/ipod_reader.html) <br/> [Part 2 - Document Reading Engine](http://www.oreillynet.com/pub/a/mac/2004/12/17/ipod_reader.html) <br/> [Part 3 - *Integration with PDFBox*](http://www.oreillynet.com/pub/a/mac/2005/01/07/ipod_reader.html) | A three part article that discusses the implementation of the PodReader application. PodReader is Cocoa application written in Objective-C and article discusses how to use the Cocoa-Java bridge to integrate with the Java version of PDFBox.|
-| [Lucene In Action](http://www.manning.com/hatcher2/) | A book that discusses integrating with the lucene search engine. One chapter discusses how to index various file formats and highlights PDFBox for indexing PDF documents.|
-| [Java Developers Journal - March 2005](http://java.sys-con.com/node/48543) | An article written by the lead developer of PDFBox discussing text extraction and AcroForm integration using PDFBox functionality.|
-| [Refactoring trends across N versions of N Java open source systems: an empirical study](http://www.dcs.bbk.ac.uk/research/techreps/2005/bbkcs-05-02.pdf) | This article describes an empirical study of multiple versions of a range of open source Java systems in an attempt to understand whether refactoring occur and, if so, which types of refactoring were most (and least) common. PDFBox is used as a case study. |
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/support.md
----------------------------------------------------------------------
diff --git a/content/support.md b/content/support.md
new file mode 100644
index 0000000..0969b5e
--- /dev/null
+++ b/content/support.md
@@ -0,0 +1,53 @@
+---
+layout: default
+title:  Support
+---
+
+# Support
+
+## Questions about How to use PDFBox
+
+If you have questions about how to use PDFBox do ask on the [Users Mailing List](/mailinglists.html "Subscribe to Mailing List"). This will get you help from the entire community.
+
+The PDFBox examples and the test code in the sources will also provide additional information.
+
+And there are additonal resources available on sites such as [Stack Overflow](http://stackoverflow.com/search?q=pdfbox "Stack Overflow").
+
+
+## Filing a bug report or enhancement request
+
+<p class="alert alert-info">Please refrain from immediately opening a ticket in the issue tracker unless 
+you are really certain it's a problem in the PDFBox software. Try using the Mailing Lists 
+before.</p>
+
+If you are sure you have found a bug the please report the problem in our 
+[Issue Tracker](https://issues.apache.org/jira/browse/PDFBOX). 
+
+**Before you submit a bug there are several things you can try first**
+
+ - for issues with text extraction try if Adobe Reader can extract the text
+ - try the latest SNAPSHOT to see if it's fixed in the pre-release
+ - search the mailing list to see if has been discussed before
+ - check the issue tracker to see if the issue has already been reported
+
+**To help us resolving a bug quicker**
+
+ - attach the PDF that makes trouble by using "More", "Attach files" in the issue tracker
+ - if your file is too large, upload it to a sharehoster, or use the PDFSplit application to isolate the troublesome page
+ - mention the PDFBox version you are using.
+ - attach the shortest possible code that reproduces the problem. Insert java code between {code}...{code}. Or try to reproduce the problem with the command line applications.
+ - mention what you were doing, what was the expected behaviour, and what happened instead
+ - provide a stack trace of an exception if there is one
+ - try using the non-sequential parser (loadNonSeq() instead of load(), and "-nonSeq" with the command line applications)
+ - search JIRA if your problem has been mentioned before.
+ - Be patient: all the people here are unpaid volunteers who work for you in their free time
+
+**And please DON'T**
+
+ - upload files to a hoster that requires registration to read the file.
+ - create an issue in JIRA and then go on vacation so you won't repond to our questions / suggestions.
+ - ask "how to" questions in JIRA. Ask such questions on the mailing lists, on stackoverflow.com, and look at the sample and the test code in the sources.
+ - attach PDF files with confidential and/or personal data (name, DoB, bank data, health data, SSN) without getting permission from the client and/or the people mentioned on the PDF
+ - create issues about obsolete PDFBox versions
+
+<p class="alert alert-info">We can sometimes solve problems without having the PDF, but it is difficult.</p>

[3/3] pdfbox-docs git commit: PDFBOX-3040: use .md for markdown files

Posted by ms...@apache.org.

PDFBOX-3040: use .md for markdown files


Project: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/commit/c68c6530
Tree: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/tree/c68c6530
Diff: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/diff/c68c6530

Branch: refs/heads/master
Commit: c68c6530d35511c9baa3dd120c557627d6e5f706
Parents: 4428815
Author: Maruan Sahyoun <sa...@fileaffairs.de>
Authored: Fri Oct 30 16:28:13 2015 +0100
Committer: Maruan Sahyoun <sa...@fileaffairs.de>
Committed: Fri Oct 30 16:28:13 2015 +0100

----------------------------------------------------------------------
 content/1.8/architecture.md                     | 104 +++++++++
 content/1.8/architecture.mdtext                 | 104 ---------
 content/1.8/commandline.md                      | 232 +++++++++++++++++++
 content/1.8/commandline.mdtext                  | 232 -------------------
 content/1.8/cookbook/documentcreation.md        |  57 +++++
 content/1.8/cookbook/documentcreation.mdtext    |  57 -----
 content/1.8/cookbook/pdfacreation.md            |  76 ++++++
 content/1.8/cookbook/pdfacreation.mdtext        |  76 ------
 content/1.8/cookbook/pdfavalidation.md          |  85 +++++++
 content/1.8/cookbook/pdfavalidation.mdtext      |  85 -------
 content/1.8/cookbook/textextraction.md          | 101 ++++++++
 content/1.8/cookbook/textextraction.mdtext      | 101 --------
 content/1.8/cookbook/workingwithattachments.md  |  54 +++++
 .../1.8/cookbook/workingwithattachments.mdtext  |  54 -----
 content/1.8/cookbook/workingwithfonts.md        | 129 +++++++++++
 content/1.8/cookbook/workingwithfonts.mdtext    | 129 -----------
 content/1.8/cookbook/workingwithmetadata.md     |  66 ++++++
 content/1.8/cookbook/workingwithmetadata.mdtext |  66 ------
 content/1.8/dependencies.md                     |  96 ++++++++
 content/1.8/dependencies.mdtext                 |  96 --------
 content/1.8/faq.md                              | 143 ++++++++++++
 content/1.8/faq.mdtext                          | 143 ------------
 content/2.0/dependencies.md                     |  56 +++++
 content/2.0/dependencies.mdtext                 |  56 -----
 content/2.0/examples.md                         |   9 +
 content/2.0/examples.mdtext                     |   9 -
 content/2.0/getting-started.md                  |  33 +++
 content/2.0/getting-started.mdtext              |  33 ---
 content/building.md                             |  70 ++++++
 content/building.mdtext                         |  70 ------
 content/codingconventions.md                    | 128 ++++++++++
 content/codingconventions.mdtext                | 128 ----------
 content/errors/403.md                           |  15 ++
 content/errors/403.mdtext                       |  15 --
 content/errors/404.md                           |  15 ++
 content/errors/404.mdtext                       |  15 --
 content/ideas.md                                |  88 +++++++
 content/ideas.mdtext                            |  88 -------
 content/index.md                                |  65 ++++++
 content/index.mdtext                            |  65 ------
 content/mailinglists.md                         |  28 +++
 content/mailinglists.mdtext                     |  28 ---
 content/references.md                           |  48 ++++
 content/references.mdtext                       |  48 ----
 content/support.md                              |  53 +++++
 content/support.mdtext                          |  53 -----
 content/team.md                                 |  45 ++++
 content/team.mdtext                             |  45 ----
 48 files changed, 1796 insertions(+), 1796 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/architecture.md
----------------------------------------------------------------------
diff --git a/content/1.8/architecture.md b/content/1.8/architecture.md
new file mode 100644
index 0000000..3ecce7a
--- /dev/null
+++ b/content/1.8/architecture.md
@@ -0,0 +1,104 @@
+---
+layout: default
+title:  Architecture
+---
+
+# Architecture
+
+In order to get the most out of PDFBox it is neccessary to understand how a PDF document
+is organized as PDFBox was architected around the concepts layed out in the 
+ISO-32000 (PDF) Specification
+
+- [ISO Site](http://www.iso.org/iso/catalogue_detail.htm?csnumber=51502)
+- [Adobe Version](http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf)
+
+## Quick Introduction to the PDF format
+
+A PDF file is made up of a sequence of bytes. These bytes, grouped into tokens, 
+make up the basic objects upon which higher level objects and structures are built [see ISO-32000 7.3].
+
+<p class="alert alert-info">PDFBox makes these basic objects available in the 
+*org.apache.pdfbox.cos* package (The COS Model).
+</p>
+
+The organization of these objects, how to they are read and how to write them is defined in the file structure of the 
+PDF [see ISO-32000 7.5]. In addition a file can be encrpyted to protect the document's content [see ISO-32000 7.5].
+
+<p class="alert alert-info">PDFBox handles the reading in the *org.apache.pdfbox.pdfparser* package. 
+Writing of PDF files is handled in the *org.apache.pdfbox.pdfwriter* package.
+</p>
+
+Within the file structure basic objects are used to create a document structure building higher level objects such 
+as pages, bookmarks, annotations [see ISO-32000 7.7].
+
+<p class="alert alert-info">PDFBox makes these higher level objects available through the 
+*org.apache.pdfbox.pdfmodel* package (The PD Model).
+</p> 
+
+In addition there is a COS representation available for the PD model if there is a need to 
+inspect the underlying structure or to handle special cases where the higher level PD model
+doesn't provide the functionality needed.
+
+<p class="alert alert-info">It's always the COS model which is represented in the PDF file.</p>
+
+## The COS Model
+
+As outlined above the basic PDF objects are represented in PDFBox in the org.apache.pdfbox.cos package.
+
+| PDF Type | Description | Example | PDFBox class | ISO 32000 |
+| --- | --- | --- | --- | --- |
+| Boolean | Standard True/False values | true | org.apache.pdfbox.cos.COSBoolean | 7.3.2 |
+| Number | Integer and floating point numbers | 1 2.3 | org.apache.pdfbox.cos.COSInteger<br/>org.apache.pdfbox.cos.COSFloat | 7.3.3 |
+| String | A sequence of characters | (This is a string) | org.apache.pdfbox.cos.COSString | 7.3.4 |
+| Name | A predefined value in a PDF document, typically used as a key in a dictionary | /Type | org.apache.pdfbox.cos.COSName | 7.3.5 |
+| Array | Arrays are one-dimensional lists of objects accessed by a numeric index. Within an array each basic object is permitted as an entry. | [549 3.14 false (Ralph) /SomeName] | org.apache.pdfbox.cos.COSArray | 7.3.6 |
+| Dictionary | A map of name value pairs | <<<br/>/Type /XObject<br/>/Name (Name)</br>/Size 1</br>>> | org.apache.pdfbox.cos.COSDictionary | 7.3.7 |
+| Stream | A stream of data, typically compressed. This is used for page contents, images and embedded font streams. | 12 0 obj << /Type /XObject >> stream 030004040404040404 endstream | org.apache.pdfbox.cos.COSStream | 7.3.8 |
+| Object | A wrapper to any of the other objects, this can be used to reference an object multiple times. An object is referenced by using two numbers, an object number and a generation number. Initially the generation number will be zero unless the object got replaced later in the stream. | 12 0 obj << /Type /XObject >> endobj | org.apache.pdfbox.cos.COSObject | |
+
+A page in a pdf document is represented with a COSDictionary. The entries that are available for a page can be seen in the PDF Reference and an example of a page looks like this:
+
+```text
+<<
+    /Type /Page
+    /MediaBox [0 0 612 915]
+    /Contents 56 0 R
+>>
+```
+
+The information within the dictionary can be accessed using the COS model
+
+```java
+COSDictionary page = ...;
+COSArray mediaBox = (COSArray)page.getDictionaryObject( "MediaBox" );
+System.out.println( "Width:" + mediaBox.get( 3 ) );
+```
+
+As can be seen from that little example the COS model provides a low level API to access 
+information within the PDF. In order to use the COS model successfully a good knowledge of
+the PDF specification is needed.
+
+## The PD Model
+
+The COS Model allows access to all aspects of a PDF document. This type of programming is
+tedious and error prone though because the user must know all of the names of the
+parameters and no helper methods are available. The PD Model was created to help
+alleviate this problem. Each type of object(page, font, image) has a set of defined
+attributes that can be available in the dictionary. 
+A PD Model class is available for each of these so that strongly typed methods are
+available to access the attributes. 
+
+The same code from above to get the page width can be rewritten to use PD Model classes.
+
+```java
+PDPage page = ...;
+PDRectangle mediaBox = page.getMediaBox();
+System.out.println( "Width:" + mediaBox.getWidth() );
+```
+
+PD Model objects sit on top of COS model. Typically, the classes in the PD Model will only
+store a COS object and all setter/getter methods will modify data that is stored in the
+COS object. For example, when you call PDPage.getLastModified() the method will do a
+lookup in the COSDictionary with the key "LastModified", if it is found the value is then
+converter to a java.util.Calendar. When PDPage.setLastModified( Calendar ) is called then
+the Calendar is converted to a string in the COSDictionary.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/architecture.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/architecture.mdtext b/content/1.8/architecture.mdtext
deleted file mode 100644
index 3ecce7a..0000000
--- a/content/1.8/architecture.mdtext
+++ /dev/null
@@ -1,104 +0,0 @@
----
-layout: default
-title:  Architecture
----
-
-# Architecture
-
-In order to get the most out of PDFBox it is neccessary to understand how a PDF document
-is organized as PDFBox was architected around the concepts layed out in the 
-ISO-32000 (PDF) Specification
-
-- [ISO Site](http://www.iso.org/iso/catalogue_detail.htm?csnumber=51502)
-- [Adobe Version](http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf)
-
-## Quick Introduction to the PDF format
-
-A PDF file is made up of a sequence of bytes. These bytes, grouped into tokens, 
-make up the basic objects upon which higher level objects and structures are built [see ISO-32000 7.3].
-
-<p class="alert alert-info">PDFBox makes these basic objects available in the 
-*org.apache.pdfbox.cos* package (The COS Model).
-</p>
-
-The organization of these objects, how to they are read and how to write them is defined in the file structure of the 
-PDF [see ISO-32000 7.5]. In addition a file can be encrpyted to protect the document's content [see ISO-32000 7.5].
-
-<p class="alert alert-info">PDFBox handles the reading in the *org.apache.pdfbox.pdfparser* package. 
-Writing of PDF files is handled in the *org.apache.pdfbox.pdfwriter* package.
-</p>
-
-Within the file structure basic objects are used to create a document structure building higher level objects such 
-as pages, bookmarks, annotations [see ISO-32000 7.7].
-
-<p class="alert alert-info">PDFBox makes these higher level objects available through the 
-*org.apache.pdfbox.pdfmodel* package (The PD Model).
-</p> 
-
-In addition there is a COS representation available for the PD model if there is a need to 
-inspect the underlying structure or to handle special cases where the higher level PD model
-doesn't provide the functionality needed.
-
-<p class="alert alert-info">It's always the COS model which is represented in the PDF file.</p>
-
-## The COS Model
-
-As outlined above the basic PDF objects are represented in PDFBox in the org.apache.pdfbox.cos package.
-
-| PDF Type | Description | Example | PDFBox class | ISO 32000 |
-| --- | --- | --- | --- | --- |
-| Boolean | Standard True/False values | true | org.apache.pdfbox.cos.COSBoolean | 7.3.2 |
-| Number | Integer and floating point numbers | 1 2.3 | org.apache.pdfbox.cos.COSInteger<br/>org.apache.pdfbox.cos.COSFloat | 7.3.3 |
-| String | A sequence of characters | (This is a string) | org.apache.pdfbox.cos.COSString | 7.3.4 |
-| Name | A predefined value in a PDF document, typically used as a key in a dictionary | /Type | org.apache.pdfbox.cos.COSName | 7.3.5 |
-| Array | Arrays are one-dimensional lists of objects accessed by a numeric index. Within an array each basic object is permitted as an entry. | [549 3.14 false (Ralph) /SomeName] | org.apache.pdfbox.cos.COSArray | 7.3.6 |
-| Dictionary | A map of name value pairs | <<<br/>/Type /XObject<br/>/Name (Name)</br>/Size 1</br>>> | org.apache.pdfbox.cos.COSDictionary | 7.3.7 |
-| Stream | A stream of data, typically compressed. This is used for page contents, images and embedded font streams. | 12 0 obj << /Type /XObject >> stream 030004040404040404 endstream | org.apache.pdfbox.cos.COSStream | 7.3.8 |
-| Object | A wrapper to any of the other objects, this can be used to reference an object multiple times. An object is referenced by using two numbers, an object number and a generation number. Initially the generation number will be zero unless the object got replaced later in the stream. | 12 0 obj << /Type /XObject >> endobj | org.apache.pdfbox.cos.COSObject | |
-
-A page in a pdf document is represented with a COSDictionary. The entries that are available for a page can be seen in the PDF Reference and an example of a page looks like this:
-
-```text
-<<
-    /Type /Page
-    /MediaBox [0 0 612 915]
-    /Contents 56 0 R
->>
-```
-
-The information within the dictionary can be accessed using the COS model
-
-```java
-COSDictionary page = ...;
-COSArray mediaBox = (COSArray)page.getDictionaryObject( "MediaBox" );
-System.out.println( "Width:" + mediaBox.get( 3 ) );
-```
-
-As can be seen from that little example the COS model provides a low level API to access 
-information within the PDF. In order to use the COS model successfully a good knowledge of
-the PDF specification is needed.
-
-## The PD Model
-
-The COS Model allows access to all aspects of a PDF document. This type of programming is
-tedious and error prone though because the user must know all of the names of the
-parameters and no helper methods are available. The PD Model was created to help
-alleviate this problem. Each type of object(page, font, image) has a set of defined
-attributes that can be available in the dictionary. 
-A PD Model class is available for each of these so that strongly typed methods are
-available to access the attributes. 
-
-The same code from above to get the page width can be rewritten to use PD Model classes.
-
-```java
-PDPage page = ...;
-PDRectangle mediaBox = page.getMediaBox();
-System.out.println( "Width:" + mediaBox.getWidth() );
-```
-
-PD Model objects sit on top of COS model. Typically, the classes in the PD Model will only
-store a COS object and all setter/getter methods will modify data that is stored in the
-COS object. For example, when you call PDPage.getLastModified() the method will do a
-lookup in the COSDictionary with the key "LastModified", if it is found the value is then
-converter to a java.util.Calendar. When PDPage.setLastModified( Calendar ) is called then
-the Calendar is converted to a string in the COSDictionary.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/commandline.md
----------------------------------------------------------------------
diff --git a/content/1.8/commandline.md b/content/1.8/commandline.md
new file mode 100644
index 0000000..c0efac1
--- /dev/null
+++ b/content/1.8/commandline.md
@@ -0,0 +1,232 @@
+---
+layout: default
+title:  Command Line Tools
+---
+
+# Command Line Tools
+
+PDFBox comes with a series of command line utilities. They are available as standard Java applications.
+
+See the Dependencies page for instructions on how to set your classpath in order to run 
+PDFBox tools as Java applications.
+
+**Table of Contents**  
+[Decrypt](#decrypt)
+[Encrypt](#encrypt)
+[ExtractText](#extracttext) 
+[OverlayPDF](#overlaypdf)
+[PrintPDF](#printpdf)
+[PDFDebugger](#pdfdebugger)
+[PDFReader](#pdfreader)
+[PDFMerger](#pdfmerger)
+[PDFSplit](#pdfsplit)
+[PDFToImage](#pdftoimage)
+[TextToPDF](#texttopdf)
+[WriteDecodedDoc](#writedecodeddoc)
+
+## Decrypt ##
+
+This application will decrypt a PDF document.
+
+NOTE: You must have the owner password to decrypt the document!
+
+usage: ``java -jar pdfbox-app-x.y.z.jar Decrypt [OPTIONS] <inputfile> [outputfile]``
+
+| Command Line Parameter 	| Description |
+| ------------------------- | ----------- |
+| -password | Password to the PDF or certificate in keystore. |
+| -keyStore | Path to keystore that holds certificate to decrypt the document. This is only required if the document is encrypted with a certificate, otherwise only the password is required. |
+| -alias | The alias to the certificate in the keystore. |
+| inputfile | The PDF file to decrypt. |
+| outputfile | The file to save the decrypted document to. If left blank then it will be the same as the input file. |
+
+## Encrypt ##
+
+This application will encrypt a PDF document.
+
+usage: ``java -jar pdfbox-app-x.y.z.jar Encrypt [OPTIONS] <password> <inputfile>``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| -O | | The owner password to the PDF, ignored if -certFile is specified. |
+| -U | | The user password to the PDF, ignored if -certFile is specified. |
+| -certFile | | Path to X.509 cert file. |
+| -canAssemble | true | Set the assemble permission. |
+| -canExtractContent | true | Set the extraction permission. |
+| -canExtractForAccessibility | true | Set the extraction permission. |
+| -canFillInForm | true | Set the fill in form permission. |
+| -canModify | true | Set the modify permission. |
+| -canModifyAnnotations | true | Set the modify annots permission. |
+| -canPrint | true | Set the print permission. |
+| -canPrintDegraded | true | Set the print degraded permission. |
+| -keyLength | 40 or 128 | The number of bits for the encryption key. For 128 bits [Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files](http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html) must be installed.|
+| inputfile |  |The PDF file to encrypt. |
+| outputfile |  |The file to save the encrypted document to. If left blank then it will be the same as the input file. |
+
+## ExtractText ##
+
+This application will extract all text from the given PDF document.
+
+usage: ``java -jar pdfbox-app-x.y.z.jar ExtractText [OPTIONS] <inputfile> [Text file] ``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| -password |  | The password to the PDF document. |
+| -encoding | default encoding | The encoding type of the text file, e.g. ISO-8859-1, UTF-8, UTF-16BE. |
+| -console | false | Send text to console instead of file. |
+| -html | false | Output in HTML format instead of raw text. |
+| -sort | false | Sort the text before writing. |
+| -ignoreBeads | false | Disables the separation by beads. |
+| -force | false | Enables pdfbox to ignore corrupt objects. |
+| -debug | false | Enables debug output about the time consumption of every stage. |
+| -startPage | 1 | The first page to extract, one based. |
+| -endPage | Integer.MAX_INT | The last page to extract, one based. |
+| -nonSeq | false | Use the new non sequential parser. |
+
+## OverlayPDF ##
+
+This application will overlay one document with the content of another document
+
+usage: ``java -jar pdfbox-app-x.y.z.jar OverlayPDF <input.pdf> [OPTIONS] <output.pdf>``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| inputfile | | The PDF file to be overlayed. |
+| defaultOverlay.pdf  | | Default overlay file. |
+| -odd oddPageOverlay.pdf| | Overlay file used for odd pages. |
+| -even evenPageOverlay.pdf| | Overlay file used for even pages. |
+| -first firstPageOverlay.pdf| | Overlay file used for the first page. |
+| -last lastPageOverlay.pdf| | Overlay file used for the last pages. |
+| -page pageNumber specificPageOverlay.pdf| | overlay file used for the given page number, may occur more than once. |
+| -position | background | Where to put the overlay, foreground or background. |
+| -nonSeq | false | Use the new non sequential parser. |
+| outputfile | | The resulting pdf file. |
+
+Examples:
+
+- OverlayPDF input.pdf overlay.pdf -nonSeq output.pdf
+- OverlayPDF input.pdf defaultOverlay.pdf -page 10 overlayForPage10.pdf -position foreground -nonSeq output.pdf
+- OverlayPDF input.pdf -odd oddOverlay.pdf -even evenOverlay.pdf -nonSeq output.pdf
+
+## PrintPDF ##
+
+This application will send a pdf document to the printer.
+
+<p class="alert alert-info">You must have the correct permissions to print the document!</p>
+
+usage: ``java -jar pdfbox-app-x.y.z.jar PrintPDF [OPTIONS] <inputfile>``
+
+| Command Line Parameter | Description |
+| --- | --- |
+| -password | The password to decrypt the PDF. |
+| -silentPrint | Print the PDF without prompting for a printer. |
+| inputfile | The PDF file to print. |
+
+## PDFDebugger ##
+
+This application will take an existing PDF document and allows to analyze and inspect the internal structure
+
+usage: ``java -jar pdfbox-app-x.y.z.jar PDFDebugger [inputfile] ``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| -password | | The password to the PDF document. |
+| -nonSeq | false | Use the new non sequential parser.
+| inputfile | | the name of an optional PDF file to open. |
+
+## PDFReader ##
+
+An application to read PDF documents. This will provide Acrobat Reader like functionality.
+
+usage: ``java -jar pdfbox-app-x.y.z.jar PDFReader [PDF file]``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| -password | | The password to the PDF document.|
+| -nonSeq | false | Use the new non sequential parser.|
+| PDF file 	| | the name of an optional PDF file to open |
+
+## PDFMerger ##
+
+This application will take a list of pdf documents and merge them, saving the result in a new document.
+
+usage: ``java -jar pdfbox-app-x.y.z.jar PDFMerger <Source PDF files (2 ..n)> <Target PDF file>``
+
+## PDFSplit ## {#pdfSplit}
+
+This application will take an existing PDF document and split it into a number of other documents
+
+usage: ``java -jar pdfbox-app-x.y.z.jar PDFSplit [OPTIONS] <PDF file>``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| -password | | The password to the PDF document. |
+| -split | | Number of pages of every splitted part of the pdf.|
+| -startPage | | The page to start at. |
+| -endPage | | The page to stop at. |
+| -nonSeq | false | Use the new non sequential parser.|
+
+Examples:
+
+ - PDFSplit -split 2 sample_with_13_pages.pdf will split the pdf in pieces of 2 pages each except the last which will contain 1 page only.
+ - PDFSplit -startPage 5 sample_with_13_pages.pdf will provide a pdf containing all pages of the source pdf starting at page 5
+ - PDFSplit -startPage 5 -endPage 10 sample_with_13_pages.pdf will provide a pdf containing all pages from 5 to 10 of the source pdf
+ - PDFSplit -split 2 -startPage 5 -endPage 10 sample_with_13_pages.pdf will provide 3 pdfs containing all pages from 5 to 10 of the source pdf 2 pages each
+
+## PDFToImage ##
+
+This application will create an image for every page in the PDF document.
+
+usage: ``java -jar pdfbox-app-x.y.z.jar PDFToImage [OPTIONS] <PDF file>``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| -password | | The password to the PDF document.|
+| -imageType | jpg | The image type to write to. Currently only jpg or png. |
+| -outputPrefix | Name of PDF document | The prefix to the image file. |
+| -startPage | 1 | The first page to convert, one based. |
+| -endPage | Integer.MAX_INT | The last page to convert, one based. |
+| -nonSeq | false | Use the new non sequential parser. | 
+
+## TextToPDF ##
+
+This application will create a PDF document from a text file.
+
+usage: ``java -jar pdfbox-app-x.y.z.jar TextToPDF [OPTIONS] <outputfile> <textfile>``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| -standardFont | Helvetica | The font to use for the text. Either this or -ttf should be specified but not both. |
+| -ttf | | The TTF font to use for the text. Either this or -standardFont should be specified but not both. |
+| -fontSize | 10 | The size of the font to use. |
+
+The following font names can be used for the parameter ``standardFont``:
+
+ - Courier
+ - Courier-Bold
+ - Courier-Oblique
+ - Courier-BoldOblique
+ - Helvetica
+ - Helvetica-Bold
+ - Helvetica-Oblique
+ - Helvetica-BoldOblique
+ - Symbol
+ - Times-Bold
+ - Times-Roman
+ - Times-Italic
+ - Times-BoldItalic
+ - ZapfDingbats
+ 
+## WriteDecodedDoc ##
+
+An application to decompress PDF documents.
+
+usage: ``java -jar pdfbox-app-x.y.z.jar WriteDecodedDoc <input-file> <output-file>``
+
+| Command Line Parameter | Default | Description |
+| --- | --- | --- |
+| -password |  | The password to the PDF document. |
+| -nonSeq 	| false | Use the new non sequential parser. |
+| <input-file> |  | The PDF file to decompress |
+| <output-file> |  | The destination PDF file |
+

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/commandline.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/commandline.mdtext b/content/1.8/commandline.mdtext
deleted file mode 100644
index c0efac1..0000000
--- a/content/1.8/commandline.mdtext
+++ /dev/null
@@ -1,232 +0,0 @@
----
-layout: default
-title:  Command Line Tools
----
-
-# Command Line Tools
-
-PDFBox comes with a series of command line utilities. They are available as standard Java applications.
-
-See the Dependencies page for instructions on how to set your classpath in order to run 
-PDFBox tools as Java applications.
-
-**Table of Contents**  
-[Decrypt](#decrypt)
-[Encrypt](#encrypt)
-[ExtractText](#extracttext) 
-[OverlayPDF](#overlaypdf)
-[PrintPDF](#printpdf)
-[PDFDebugger](#pdfdebugger)
-[PDFReader](#pdfreader)
-[PDFMerger](#pdfmerger)
-[PDFSplit](#pdfsplit)
-[PDFToImage](#pdftoimage)
-[TextToPDF](#texttopdf)
-[WriteDecodedDoc](#writedecodeddoc)
-
-## Decrypt ##
-
-This application will decrypt a PDF document.
-
-NOTE: You must have the owner password to decrypt the document!
-
-usage: ``java -jar pdfbox-app-x.y.z.jar Decrypt [OPTIONS] <inputfile> [outputfile]``
-
-| Command Line Parameter 	| Description |
-| ------------------------- | ----------- |
-| -password | Password to the PDF or certificate in keystore. |
-| -keyStore | Path to keystore that holds certificate to decrypt the document. This is only required if the document is encrypted with a certificate, otherwise only the password is required. |
-| -alias | The alias to the certificate in the keystore. |
-| inputfile | The PDF file to decrypt. |
-| outputfile | The file to save the decrypted document to. If left blank then it will be the same as the input file. |
-
-## Encrypt ##
-
-This application will encrypt a PDF document.
-
-usage: ``java -jar pdfbox-app-x.y.z.jar Encrypt [OPTIONS] <password> <inputfile>``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| -O | | The owner password to the PDF, ignored if -certFile is specified. |
-| -U | | The user password to the PDF, ignored if -certFile is specified. |
-| -certFile | | Path to X.509 cert file. |
-| -canAssemble | true | Set the assemble permission. |
-| -canExtractContent | true | Set the extraction permission. |
-| -canExtractForAccessibility | true | Set the extraction permission. |
-| -canFillInForm | true | Set the fill in form permission. |
-| -canModify | true | Set the modify permission. |
-| -canModifyAnnotations | true | Set the modify annots permission. |
-| -canPrint | true | Set the print permission. |
-| -canPrintDegraded | true | Set the print degraded permission. |
-| -keyLength | 40 or 128 | The number of bits for the encryption key. For 128 bits [Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files](http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html) must be installed.|
-| inputfile |  |The PDF file to encrypt. |
-| outputfile |  |The file to save the encrypted document to. If left blank then it will be the same as the input file. |
-
-## ExtractText ##
-
-This application will extract all text from the given PDF document.
-
-usage: ``java -jar pdfbox-app-x.y.z.jar ExtractText [OPTIONS] <inputfile> [Text file] ``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| -password |  | The password to the PDF document. |
-| -encoding | default encoding | The encoding type of the text file, e.g. ISO-8859-1, UTF-8, UTF-16BE. |
-| -console | false | Send text to console instead of file. |
-| -html | false | Output in HTML format instead of raw text. |
-| -sort | false | Sort the text before writing. |
-| -ignoreBeads | false | Disables the separation by beads. |
-| -force | false | Enables pdfbox to ignore corrupt objects. |
-| -debug | false | Enables debug output about the time consumption of every stage. |
-| -startPage | 1 | The first page to extract, one based. |
-| -endPage | Integer.MAX_INT | The last page to extract, one based. |
-| -nonSeq | false | Use the new non sequential parser. |
-
-## OverlayPDF ##
-
-This application will overlay one document with the content of another document
-
-usage: ``java -jar pdfbox-app-x.y.z.jar OverlayPDF <input.pdf> [OPTIONS] <output.pdf>``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| inputfile | | The PDF file to be overlayed. |
-| defaultOverlay.pdf  | | Default overlay file. |
-| -odd oddPageOverlay.pdf| | Overlay file used for odd pages. |
-| -even evenPageOverlay.pdf| | Overlay file used for even pages. |
-| -first firstPageOverlay.pdf| | Overlay file used for the first page. |
-| -last lastPageOverlay.pdf| | Overlay file used for the last pages. |
-| -page pageNumber specificPageOverlay.pdf| | overlay file used for the given page number, may occur more than once. |
-| -position | background | Where to put the overlay, foreground or background. |
-| -nonSeq | false | Use the new non sequential parser. |
-| outputfile | | The resulting pdf file. |
-
-Examples:
-
-- OverlayPDF input.pdf overlay.pdf -nonSeq output.pdf
-- OverlayPDF input.pdf defaultOverlay.pdf -page 10 overlayForPage10.pdf -position foreground -nonSeq output.pdf
-- OverlayPDF input.pdf -odd oddOverlay.pdf -even evenOverlay.pdf -nonSeq output.pdf
-
-## PrintPDF ##
-
-This application will send a pdf document to the printer.
-
-<p class="alert alert-info">You must have the correct permissions to print the document!</p>
-
-usage: ``java -jar pdfbox-app-x.y.z.jar PrintPDF [OPTIONS] <inputfile>``
-
-| Command Line Parameter | Description |
-| --- | --- |
-| -password | The password to decrypt the PDF. |
-| -silentPrint | Print the PDF without prompting for a printer. |
-| inputfile | The PDF file to print. |
-
-## PDFDebugger ##
-
-This application will take an existing PDF document and allows to analyze and inspect the internal structure
-
-usage: ``java -jar pdfbox-app-x.y.z.jar PDFDebugger [inputfile] ``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| -password | | The password to the PDF document. |
-| -nonSeq | false | Use the new non sequential parser.
-| inputfile | | the name of an optional PDF file to open. |
-
-## PDFReader ##
-
-An application to read PDF documents. This will provide Acrobat Reader like functionality.
-
-usage: ``java -jar pdfbox-app-x.y.z.jar PDFReader [PDF file]``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| -password | | The password to the PDF document.|
-| -nonSeq | false | Use the new non sequential parser.|
-| PDF file 	| | the name of an optional PDF file to open |
-
-## PDFMerger ##
-
-This application will take a list of pdf documents and merge them, saving the result in a new document.
-
-usage: ``java -jar pdfbox-app-x.y.z.jar PDFMerger <Source PDF files (2 ..n)> <Target PDF file>``
-
-## PDFSplit ## {#pdfSplit}
-
-This application will take an existing PDF document and split it into a number of other documents
-
-usage: ``java -jar pdfbox-app-x.y.z.jar PDFSplit [OPTIONS] <PDF file>``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| -password | | The password to the PDF document. |
-| -split | | Number of pages of every splitted part of the pdf.|
-| -startPage | | The page to start at. |
-| -endPage | | The page to stop at. |
-| -nonSeq | false | Use the new non sequential parser.|
-
-Examples:
-
- - PDFSplit -split 2 sample_with_13_pages.pdf will split the pdf in pieces of 2 pages each except the last which will contain 1 page only.
- - PDFSplit -startPage 5 sample_with_13_pages.pdf will provide a pdf containing all pages of the source pdf starting at page 5
- - PDFSplit -startPage 5 -endPage 10 sample_with_13_pages.pdf will provide a pdf containing all pages from 5 to 10 of the source pdf
- - PDFSplit -split 2 -startPage 5 -endPage 10 sample_with_13_pages.pdf will provide 3 pdfs containing all pages from 5 to 10 of the source pdf 2 pages each
-
-## PDFToImage ##
-
-This application will create an image for every page in the PDF document.
-
-usage: ``java -jar pdfbox-app-x.y.z.jar PDFToImage [OPTIONS] <PDF file>``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| -password | | The password to the PDF document.|
-| -imageType | jpg | The image type to write to. Currently only jpg or png. |
-| -outputPrefix | Name of PDF document | The prefix to the image file. |
-| -startPage | 1 | The first page to convert, one based. |
-| -endPage | Integer.MAX_INT | The last page to convert, one based. |
-| -nonSeq | false | Use the new non sequential parser. | 
-
-## TextToPDF ##
-
-This application will create a PDF document from a text file.
-
-usage: ``java -jar pdfbox-app-x.y.z.jar TextToPDF [OPTIONS] <outputfile> <textfile>``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| -standardFont | Helvetica | The font to use for the text. Either this or -ttf should be specified but not both. |
-| -ttf | | The TTF font to use for the text. Either this or -standardFont should be specified but not both. |
-| -fontSize | 10 | The size of the font to use. |
-
-The following font names can be used for the parameter ``standardFont``:
-
- - Courier
- - Courier-Bold
- - Courier-Oblique
- - Courier-BoldOblique
- - Helvetica
- - Helvetica-Bold
- - Helvetica-Oblique
- - Helvetica-BoldOblique
- - Symbol
- - Times-Bold
- - Times-Roman
- - Times-Italic
- - Times-BoldItalic
- - ZapfDingbats
- 
-## WriteDecodedDoc ##
-
-An application to decompress PDF documents.
-
-usage: ``java -jar pdfbox-app-x.y.z.jar WriteDecodedDoc <input-file> <output-file>``
-
-| Command Line Parameter | Default | Description |
-| --- | --- | --- |
-| -password |  | The password to the PDF document. |
-| -nonSeq 	| false | Use the new non sequential parser. |
-| <input-file> |  | The PDF file to decompress |
-| <output-file> |  | The destination PDF file |
-

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/documentcreation.md
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/documentcreation.md b/content/1.8/cookbook/documentcreation.md
new file mode 100644
index 0000000..b41c766
--- /dev/null
+++ b/content/1.8/cookbook/documentcreation.md
@@ -0,0 +1,57 @@
+---
+layout: default
+title:  Cookbook - Document Creation
+---
+
+# Document Creation
+
+## Create a blank PDF
+
+This small sample shows how to create a new PDF document using PDFBox.
+
+~~~java
+// Create a new empty document
+PDDocument document = new PDDocument();
+
+// Create a new blank page and add it to the document
+PDPage blankPage = new PDPage();
+document.addPage( blankPage );
+
+// Save the newly created document
+document.save("BlankPage.pdf");
+
+// finally make sure that the document is properly
+// closed.
+document.close();
+~~~
+
+## Hello World using a PDF base font
+
+This small sample shows how to create a new document and print the text "Hello World" using one of the PDF base fonts.
+
+~~~java
+// Create a document and add a page to it
+PDDocument document = new PDDocument();
+PDPage page = new PDPage();
+document.addPage( page );
+
+// Create a new font object selecting one of the PDF base fonts
+PDFont font = PDType1Font.HELVETICA_BOLD;
+
+// Start a new content stream which will "hold" the to be created content
+PDPageContentStream contentStream = new PDPageContentStream(document, page);
+
+// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
+contentStream.beginText();
+contentStream.setFont( font, 12 );
+contentStream.moveTextPositionByAmount( 100, 700 );
+contentStream.drawString( "Hello World" );
+contentStream.endText();
+
+// Make sure that the content stream is closed:
+contentStream.close();
+
+// Save the results and ensure that the document is properly closed:
+document.save( "Hello World.pdf");
+document.close();
+~~~
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/documentcreation.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/documentcreation.mdtext b/content/1.8/cookbook/documentcreation.mdtext
deleted file mode 100644
index b41c766..0000000
--- a/content/1.8/cookbook/documentcreation.mdtext
+++ /dev/null
@@ -1,57 +0,0 @@
----
-layout: default
-title:  Cookbook - Document Creation
----
-
-# Document Creation
-
-## Create a blank PDF
-
-This small sample shows how to create a new PDF document using PDFBox.
-
-~~~java
-// Create a new empty document
-PDDocument document = new PDDocument();
-
-// Create a new blank page and add it to the document
-PDPage blankPage = new PDPage();
-document.addPage( blankPage );
-
-// Save the newly created document
-document.save("BlankPage.pdf");
-
-// finally make sure that the document is properly
-// closed.
-document.close();
-~~~
-
-## Hello World using a PDF base font
-
-This small sample shows how to create a new document and print the text "Hello World" using one of the PDF base fonts.
-
-~~~java
-// Create a document and add a page to it
-PDDocument document = new PDDocument();
-PDPage page = new PDPage();
-document.addPage( page );
-
-// Create a new font object selecting one of the PDF base fonts
-PDFont font = PDType1Font.HELVETICA_BOLD;
-
-// Start a new content stream which will "hold" the to be created content
-PDPageContentStream contentStream = new PDPageContentStream(document, page);
-
-// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
-contentStream.beginText();
-contentStream.setFont( font, 12 );
-contentStream.moveTextPositionByAmount( 100, 700 );
-contentStream.drawString( "Hello World" );
-contentStream.endText();
-
-// Make sure that the content stream is closed:
-contentStream.close();
-
-// Save the results and ensure that the document is properly closed:
-document.save( "Hello World.pdf");
-document.close();
-~~~
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/pdfacreation.md
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/pdfacreation.md b/content/1.8/cookbook/pdfacreation.md
new file mode 100644
index 0000000..7bfe9e0
--- /dev/null
+++ b/content/1.8/cookbook/pdfacreation.md
@@ -0,0 +1,76 @@
+---
+layout: default
+title:     Create a valid PDF/A document
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+---
+
+# PDF/A Creation
+
+The Apache PDFBox API can be used to create a PDF/A File. PDF/A is a PDF file with some constraints to ensure its 
+long time conservation. These constraints are described in ISO 19005.
+
+This small sample shows what should be added during creation of a PDF file to transform it in a valid PDF/A 
+document. The current example creates a valid PDF/A-1b document.
+
+## Load all the fonts used in document
+
+The PDF/A specification enforces that the fonts used in the document are present in the PDF File. You
+have to load them. As an example:
+
+~~~java
+InputStream fontStream = CreatePDFA.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/ArialMT.ttf");
+PDFont font = PDTrueTypeFont.loadTTF(doc, fontStream);
+~~~
+## Including XMP metadata block
+
+It is imposed to have xmp metadata defined in the PDF. At least, the PDFA Schema (giving details on the version
+of PDF/A specification reached by the document) must be present. These lines create the xmp metadata for a
+PDF/A-1b document:
+
+~~~java
+XMPMetadata xmp = new XMPMetadata();
+XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
+xmp.addSchema(pdfaid);
+pdfaid.setConformance("B");
+pdfaid.setPart(1);
+pdfaid.setAbout("");
+metadata.importXMPMetadata(xmp);
+~~~
+
+## Including color profile
+
+It is mandatory to include the color profile used by the document. Different profiles can be used. This 
+example takes one present in pdfbox:
+
+~~~java
+// create output intent
+InputStream colorProfile = CreatePDFA.class.getResourceAsStream("/org/apache/pdfbox/resources/pdfa/sRGB Color Space Profile.icm");
+PDOutputIntent oi = new PDOutputIntent(doc, colorProfile); 
+oi.setInfo("sRGB IEC61966-2.1"); 
+oi.setOutputCondition("sRGB IEC61966-2.1"); 
+oi.setOutputConditionIdentifier("sRGB IEC61966-2.1"); 
+oi.setRegistryName("http://www.color.org"); 
+cat.addOutputIntent(oi);
+~~~~
+
+## Complete example
+
+The complete example can be found in pdfbox-example. The source file is
+
+	src/main/java/org/apache/pdfbox/examples/pdfa/CreatePDFA.java
+

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/pdfacreation.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/pdfacreation.mdtext b/content/1.8/cookbook/pdfacreation.mdtext
deleted file mode 100644
index 7bfe9e0..0000000
--- a/content/1.8/cookbook/pdfacreation.mdtext
+++ /dev/null
@@ -1,76 +0,0 @@
----
-layout: default
-title:     Create a valid PDF/A document
-Notice:    Licensed to the Apache Software Foundation (ASF) under one
-           or more contributor license agreements.  See the NOTICE file
-           distributed with this work for additional information
-           regarding copyright ownership.  The ASF licenses this file
-           to you under the Apache License, Version 2.0 (the
-           "License"); you may not use this file except in compliance
-           with the License.  You may obtain a copy of the License at
-           .
-             http://www.apache.org/licenses/LICENSE-2.0
-           .
-           Unless required by applicable law or agreed to in writing,
-           software distributed under the License is distributed on an
-           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-           KIND, either express or implied.  See the License for the
-           specific language governing permissions and limitations
-           under the License.
----
-
-# PDF/A Creation
-
-The Apache PDFBox API can be used to create a PDF/A File. PDF/A is a PDF file with some constraints to ensure its 
-long time conservation. These constraints are described in ISO 19005.
-
-This small sample shows what should be added during creation of a PDF file to transform it in a valid PDF/A 
-document. The current example creates a valid PDF/A-1b document.
-
-## Load all the fonts used in document
-
-The PDF/A specification enforces that the fonts used in the document are present in the PDF File. You
-have to load them. As an example:
-
-~~~java
-InputStream fontStream = CreatePDFA.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/ArialMT.ttf");
-PDFont font = PDTrueTypeFont.loadTTF(doc, fontStream);
-~~~
-## Including XMP metadata block
-
-It is imposed to have xmp metadata defined in the PDF. At least, the PDFA Schema (giving details on the version
-of PDF/A specification reached by the document) must be present. These lines create the xmp metadata for a
-PDF/A-1b document:
-
-~~~java
-XMPMetadata xmp = new XMPMetadata();
-XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
-xmp.addSchema(pdfaid);
-pdfaid.setConformance("B");
-pdfaid.setPart(1);
-pdfaid.setAbout("");
-metadata.importXMPMetadata(xmp);
-~~~
-
-## Including color profile
-
-It is mandatory to include the color profile used by the document. Different profiles can be used. This 
-example takes one present in pdfbox:
-
-~~~java
-// create output intent
-InputStream colorProfile = CreatePDFA.class.getResourceAsStream("/org/apache/pdfbox/resources/pdfa/sRGB Color Space Profile.icm");
-PDOutputIntent oi = new PDOutputIntent(doc, colorProfile); 
-oi.setInfo("sRGB IEC61966-2.1"); 
-oi.setOutputCondition("sRGB IEC61966-2.1"); 
-oi.setOutputConditionIdentifier("sRGB IEC61966-2.1"); 
-oi.setRegistryName("http://www.color.org"); 
-cat.addOutputIntent(oi);
-~~~~
-
-## Complete example
-
-The complete example can be found in pdfbox-example. The source file is
-
-	src/main/java/org/apache/pdfbox/examples/pdfa/CreatePDFA.java
-

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/pdfavalidation.md
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/pdfavalidation.md b/content/1.8/cookbook/pdfavalidation.md
new file mode 100644
index 0000000..bef22e9
--- /dev/null
+++ b/content/1.8/cookbook/pdfavalidation.md
@@ -0,0 +1,85 @@
+---
+layout: default
+title: Cookbook - PDF/A Validation
+---
+
+# PDF/A Validation
+
+The Apache Preflight library is a Java tool that implements a parser compliant with the ISO-19005 specification (aka PDF/A-1).
+Check Compliance with PDF/A-1b
+
+This small sample shows how to check the compliance of a file with the PDF/A-1b specification.
+
+~~~java
+ValidationResult result = null;
+
+FileDataSource fd = new FileDataSource(args[0]);
+PreflightParser parser = new PreflightParser(fd);
+try
+{
+
+    /* Parse the PDF file with PreflightParser that inherits from the NonSequentialParser.
+     * Some additional controls are present to check a set of PDF/A requirements. 
+     * (Stream length consistency, EOL after some Keyword...)
+     */
+    parser.parse();
+
+    /* Once the syntax validation is done, 
+     * the parser can provide a PreflightDocument 
+     * (that inherits from PDDocument) 
+     * This document process the end of PDF/A validation.
+     */
+    PreflightDocument document = parser.getPreflightDocument();
+    document.validate();
+
+    // Get validation result
+    result = document.getResult();
+    document.close();
+
+}
+catch (SyntaxValidationException e)
+{
+    /* the parse method can throw a SyntaxValidationException 
+     * if the PDF file can't be parsed.
+     * In this case, the exception contains an instance of ValidationResult  
+     */
+    result = e.getResult();
+}
+
+// display validation result
+if (result.isValid())
+{
+    System.out.println("The file " + args[0] + " is a valid PDF/A-1b file");
+}
+else
+{
+    System.out.println("The file" + args[0] + " is not valid, error(s) :");
+    for (ValidationError error : result.getErrorsList())
+    {
+        System.out.println(error.getErrorCode() + " : " + error.getDetails());
+    }
+}
+~~~
+      	
+## Categories of Validation Error
+
+If a validation fails, the ValidationResult object contains all causes of the failure.
+In order to help in the failure understanding, all error codes have the following form X[.Y[.Z]] where :
+
+ - 'X' is the category (ex : Font validation error...)
+ - 'Y' represent a subsection of the category (ex : "Font with Glyph error")
+ - 'Z' represent the cause of the error (ex : "Font with a missing Glyph")
+
+Category ('Y') and cause ('Z') may be missing according to the difficulty to identify the error detail.
+
+Here after, you can find all Categories (for detailed cause, see constants in the ``PreflightConstants`` interface) :
+
+| Category | Description |
+| -------- | ----------- | 
+| 1[.y[.z]] | Syntax Error |
+| 2[.y[.z]] | Graphic Error |
+| 3[.y[.z]] | Font Error |
+| 4[.y[.z]] | Transparency Error |
+| 5[.y[.z]] | Annotation Error |
+| 6[.y[.z]] | Action Error |
+| 7[.y[.z]] | Metadata Error |

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/pdfavalidation.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/pdfavalidation.mdtext b/content/1.8/cookbook/pdfavalidation.mdtext
deleted file mode 100644
index bef22e9..0000000
--- a/content/1.8/cookbook/pdfavalidation.mdtext
+++ /dev/null
@@ -1,85 +0,0 @@
----
-layout: default
-title: Cookbook - PDF/A Validation
----
-
-# PDF/A Validation
-
-The Apache Preflight library is a Java tool that implements a parser compliant with the ISO-19005 specification (aka PDF/A-1).
-Check Compliance with PDF/A-1b
-
-This small sample shows how to check the compliance of a file with the PDF/A-1b specification.
-
-~~~java
-ValidationResult result = null;
-
-FileDataSource fd = new FileDataSource(args[0]);
-PreflightParser parser = new PreflightParser(fd);
-try
-{
-
-    /* Parse the PDF file with PreflightParser that inherits from the NonSequentialParser.
-     * Some additional controls are present to check a set of PDF/A requirements. 
-     * (Stream length consistency, EOL after some Keyword...)
-     */
-    parser.parse();
-
-    /* Once the syntax validation is done, 
-     * the parser can provide a PreflightDocument 
-     * (that inherits from PDDocument) 
-     * This document process the end of PDF/A validation.
-     */
-    PreflightDocument document = parser.getPreflightDocument();
-    document.validate();
-
-    // Get validation result
-    result = document.getResult();
-    document.close();
-
-}
-catch (SyntaxValidationException e)
-{
-    /* the parse method can throw a SyntaxValidationException 
-     * if the PDF file can't be parsed.
-     * In this case, the exception contains an instance of ValidationResult  
-     */
-    result = e.getResult();
-}
-
-// display validation result
-if (result.isValid())
-{
-    System.out.println("The file " + args[0] + " is a valid PDF/A-1b file");
-}
-else
-{
-    System.out.println("The file" + args[0] + " is not valid, error(s) :");
-    for (ValidationError error : result.getErrorsList())
-    {
-        System.out.println(error.getErrorCode() + " : " + error.getDetails());
-    }
-}
-~~~
-      	
-## Categories of Validation Error
-
-If a validation fails, the ValidationResult object contains all causes of the failure.
-In order to help in the failure understanding, all error codes have the following form X[.Y[.Z]] where :
-
- - 'X' is the category (ex : Font validation error...)
- - 'Y' represent a subsection of the category (ex : "Font with Glyph error")
- - 'Z' represent the cause of the error (ex : "Font with a missing Glyph")
-
-Category ('Y') and cause ('Z') may be missing according to the difficulty to identify the error detail.
-
-Here after, you can find all Categories (for detailed cause, see constants in the ``PreflightConstants`` interface) :
-
-| Category | Description |
-| -------- | ----------- | 
-| 1[.y[.z]] | Syntax Error |
-| 2[.y[.z]] | Graphic Error |
-| 3[.y[.z]] | Font Error |
-| 4[.y[.z]] | Transparency Error |
-| 5[.y[.z]] | Annotation Error |
-| 6[.y[.z]] | Action Error |
-| 7[.y[.z]] | Metadata Error |

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/textextraction.md
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/textextraction.md b/content/1.8/cookbook/textextraction.md
new file mode 100644
index 0000000..46237b3
--- /dev/null
+++ b/content/1.8/cookbook/textextraction.md
@@ -0,0 +1,101 @@
+---
+layout: default
+title: Cookbook - Textextraction
+---
+
+# Textextraction
+
+## Extracting Text
+
+See class:org.apache.pdfbox.util.PDFTextStripper  
+See class:org.apache.pdfbox.searchengine.lucene.LucenePDFDocument  
+See command line app:ExtractText  
+
+One of the main features of PDFBox is its ability to quickly and accurately extract text 
+from a variety of PDF documents. This functionality is encapsulated in the 
+org.apache.pdfbox.util.PDFTextStripper and can be easily executed on the command line with 
+org.apache.pdfbox.ExtractText.
+
+## Lucene Integration
+
+Lucene is an open source text search library from the Apache Jakarta Project. In order for
+Lucene to be able to index a PDF document it must first be converted to text. PDFBox provides 
+a simple approach for adding PDF documents into a Lucene index.
+
+~~~java
+Document luceneDocument = LucenePDFDocument.getDocument( ... );
+~~~
+
+Now that you hava a Lucene Document object, you can add it to the Lucene index just like 
+you would if it had been created from a text or HTML file. The LucenePDFDocument automatically 
+extracts a variety of metadata fields from the PDF to be added to the index, the javadoc 
+shows details on those fields. This approach is very simple and should be sufficient for 
+most users, if not then you can use some of the advanced text extraction techniques 
+described in the next section.
+
+## Advanced Text Extraction
+
+Some applications will have complex text extraction requiments and neither the command 
+line application nor the LucenePDFDocument will be able to fulfill those requirements. 
+It is possible for users to utilize or extend the PDFTextStripper class to meet some of 
+these requirements.
+
+### Limiting The Extracted Text
+
+There are several ways that we can limit the text that is extracted during the extraction 
+process. The simplest is to specify the range of pages that you want to be extracted. 
+For example, to only extract text from the second and third pages of the PDF document 
+you could do this:
+
+~~~java
+PDFTextStripper stripper = new PDFTextStripper();
+stripper.setStartPage( 2 );
+stripper.setEndPage( 3 );
+stripper.writeText( ... );
+~~~~
+        
+NOTE: The startPage and endPage properties of PDFTextStripper are 1 based and inclusive.
+
+If you wanted to start on page 2 and extract to the end of the document then you would just
+set the startPage property. By default all pages in the pdf document are extracted.
+
+It is also possible to limit the extracted text to be between two bookmarks in the page. 
+If you are not familiar with how to use bookmarks in PDFBox then you should review the 
+Bookmarks page. Similar to the startPage/endPage properties, PDFTextStripper also has 
+startBookmark/endBookmark properties. There are some caveats to be aware of when using this
+feature of the PDFTextStripper. Not all bookmarks point to a page in the current PDF document. 
+
+The possible states of a bookmark are:
+
+ - null - The property was not set, this is the default.
+ - Points to page in the PDF - The property was set and points to a valid page in the PDF
+ - Bookmark does not point to anything - The property was set but the bookmark does not point to any page
+ - Bookmark points to external action - The property was set, but it points to a page in a different PDF or performs an action when activated
+
+The table below will describe how PDFBox behaves in the various scenarios:
+
+| Start Bookmark | End Bookmark | Result |
+| -------------- | ------------ | ------ |
+| null | null | This is the default, the properties have no effect on the text extraction. |
+| Points to a page in the PDF | null | Text extraction will begin on the page that this bookmark points to and go until the end of the document. |
+| null | Points to a page in the PDF | Text extraction will begin on the first page and stop at the end of the page that this bookmark points to. |
+| Bookmark does not point to anything | null | Because the PDFTextStripper cannot determine a start page based on the bookmark, it will start on the first page and go until the end of the document. |
+| null | Bookmark does not point to anything | Because the PDFTextStripper cannot determine a end page based on the bookmark, it will start on the first page and go until the end of the document. |
+| Bookmark does not point to anything | Bookmark does not point to anything | This is a special case! If the startBookmark and endBookmark are exactly the same then no text will be extracted. If they are different then it is not possible for the PDFTextStripper to determine that pages so it will include the entire document. | 
+| Bookmark points to external action | Bookmark points to external action | If either the startBookmark or the endBookmark refer to an external page or execute an action then an OutlineNotLocalException will be thrown to indicate to the user that the bookmark is not valid. |
+
+NOTE: PDFTextStripper will check both the startPage/endPage and the startBookmark/endBookmark to determine if text should be extracted from the current page.
+
+### External Glyph List
+
+Some PDF files need to map between glyph names and Unicode values during text extraction. 
+PDFBox comes with an Adobe Glyph List, but you may encounter files with glyph names that 
+are not in that map. To use your own glyphlist file, supply the file name to the ``glyphlist_ext`` JVM property.
+
+### Right to Left Text
+
+Extracting text in languages whose text goes from right to left (such as Arabic and Hebrew)
+in PDF files can result in text that is backwards. PDFBox can normalize and reverse the text
+if the ICU4J jar file has been placed on the classpath (it is an optional dependency). 
+Note that you should also enable sorting with either org.apache.pdfbox.util.PDFTextStripper 
+or org.apache.pdfbox.ExtractText to ensure accurate output.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/textextraction.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/textextraction.mdtext b/content/1.8/cookbook/textextraction.mdtext
deleted file mode 100644
index 46237b3..0000000
--- a/content/1.8/cookbook/textextraction.mdtext
+++ /dev/null
@@ -1,101 +0,0 @@
----
-layout: default
-title: Cookbook - Textextraction
----
-
-# Textextraction
-
-## Extracting Text
-
-See class:org.apache.pdfbox.util.PDFTextStripper  
-See class:org.apache.pdfbox.searchengine.lucene.LucenePDFDocument  
-See command line app:ExtractText  
-
-One of the main features of PDFBox is its ability to quickly and accurately extract text 
-from a variety of PDF documents. This functionality is encapsulated in the 
-org.apache.pdfbox.util.PDFTextStripper and can be easily executed on the command line with 
-org.apache.pdfbox.ExtractText.
-
-## Lucene Integration
-
-Lucene is an open source text search library from the Apache Jakarta Project. In order for
-Lucene to be able to index a PDF document it must first be converted to text. PDFBox provides 
-a simple approach for adding PDF documents into a Lucene index.
-
-~~~java
-Document luceneDocument = LucenePDFDocument.getDocument( ... );
-~~~
-
-Now that you hava a Lucene Document object, you can add it to the Lucene index just like 
-you would if it had been created from a text or HTML file. The LucenePDFDocument automatically 
-extracts a variety of metadata fields from the PDF to be added to the index, the javadoc 
-shows details on those fields. This approach is very simple and should be sufficient for 
-most users, if not then you can use some of the advanced text extraction techniques 
-described in the next section.
-
-## Advanced Text Extraction
-
-Some applications will have complex text extraction requiments and neither the command 
-line application nor the LucenePDFDocument will be able to fulfill those requirements. 
-It is possible for users to utilize or extend the PDFTextStripper class to meet some of 
-these requirements.
-
-### Limiting The Extracted Text
-
-There are several ways that we can limit the text that is extracted during the extraction 
-process. The simplest is to specify the range of pages that you want to be extracted. 
-For example, to only extract text from the second and third pages of the PDF document 
-you could do this:
-
-~~~java
-PDFTextStripper stripper = new PDFTextStripper();
-stripper.setStartPage( 2 );
-stripper.setEndPage( 3 );
-stripper.writeText( ... );
-~~~~
-        
-NOTE: The startPage and endPage properties of PDFTextStripper are 1 based and inclusive.
-
-If you wanted to start on page 2 and extract to the end of the document then you would just
-set the startPage property. By default all pages in the pdf document are extracted.
-
-It is also possible to limit the extracted text to be between two bookmarks in the page. 
-If you are not familiar with how to use bookmarks in PDFBox then you should review the 
-Bookmarks page. Similar to the startPage/endPage properties, PDFTextStripper also has 
-startBookmark/endBookmark properties. There are some caveats to be aware of when using this
-feature of the PDFTextStripper. Not all bookmarks point to a page in the current PDF document. 
-
-The possible states of a bookmark are:
-
- - null - The property was not set, this is the default.
- - Points to page in the PDF - The property was set and points to a valid page in the PDF
- - Bookmark does not point to anything - The property was set but the bookmark does not point to any page
- - Bookmark points to external action - The property was set, but it points to a page in a different PDF or performs an action when activated
-
-The table below will describe how PDFBox behaves in the various scenarios:
-
-| Start Bookmark | End Bookmark | Result |
-| -------------- | ------------ | ------ |
-| null | null | This is the default, the properties have no effect on the text extraction. |
-| Points to a page in the PDF | null | Text extraction will begin on the page that this bookmark points to and go until the end of the document. |
-| null | Points to a page in the PDF | Text extraction will begin on the first page and stop at the end of the page that this bookmark points to. |
-| Bookmark does not point to anything | null | Because the PDFTextStripper cannot determine a start page based on the bookmark, it will start on the first page and go until the end of the document. |
-| null | Bookmark does not point to anything | Because the PDFTextStripper cannot determine a end page based on the bookmark, it will start on the first page and go until the end of the document. |
-| Bookmark does not point to anything | Bookmark does not point to anything | This is a special case! If the startBookmark and endBookmark are exactly the same then no text will be extracted. If they are different then it is not possible for the PDFTextStripper to determine that pages so it will include the entire document. | 
-| Bookmark points to external action | Bookmark points to external action | If either the startBookmark or the endBookmark refer to an external page or execute an action then an OutlineNotLocalException will be thrown to indicate to the user that the bookmark is not valid. |
-
-NOTE: PDFTextStripper will check both the startPage/endPage and the startBookmark/endBookmark to determine if text should be extracted from the current page.
-
-### External Glyph List
-
-Some PDF files need to map between glyph names and Unicode values during text extraction. 
-PDFBox comes with an Adobe Glyph List, but you may encounter files with glyph names that 
-are not in that map. To use your own glyphlist file, supply the file name to the ``glyphlist_ext`` JVM property.
-
-### Right to Left Text
-
-Extracting text in languages whose text goes from right to left (such as Arabic and Hebrew)
-in PDF files can result in text that is backwards. PDFBox can normalize and reverse the text
-if the ICU4J jar file has been placed on the classpath (it is an optional dependency). 
-Note that you should also enable sorting with either org.apache.pdfbox.util.PDFTextStripper 
-or org.apache.pdfbox.ExtractText to ensure accurate output.

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/workingwithattachments.md
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/workingwithattachments.md b/content/1.8/cookbook/workingwithattachments.md
new file mode 100644
index 0000000..9b77301
--- /dev/null
+++ b/content/1.8/cookbook/workingwithattachments.md
@@ -0,0 +1,54 @@
+---
+layout: default
+title: Cookbook - Working with Attachments
+---
+
+# Working with Attachments
+
+## The PDF File Specification
+
+See package:org.apache.pdfbox.pdmodel.common.filespecification  
+See example:EmbeddedFiles  
+
+A PDF can contain references to external files via the file system or a URL to a remote 
+location. It is also possible to embed a binary file into a PDF document.
+
+There are two classes that can be used when referencing a file. ``PDSimpleFileSpecification``
+is a simple string reference to a file(e.g. "./movies/BigMovie.avi"). The simple file 
+specification does not allow for any parameters to be set. 
+
+The ``PDComplexFileSpecification`` is more feature rich and allows for advanced settings on 
+the file reference.
+
+It is also possible to embed a file directly into a PDF. Instead of setting the file 
+attribute of the ``PDComplexFileSpecification``, the ``EmbeddedFile`` attribute can be used instead.
+
+## Adding a File Attachment
+
+PDF documents can contain file attachments that are accessed from the Document->File Attachments 
+menu. PDFBox allows attachments to be added to and extracted from PDF documents. 
+Attachments are part of the named tree that is attached to the document catalog.
+
+~~~java
+PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
+
+//first create the file specification, which holds the embedded file
+PDComplexFileSpecification fs = new PDComplexFileSpecification();
+fs.setFile( "Test.txt" );
+InputStream is = ...;
+PDEmbeddedFile ef = new PDEmbeddedFile(doc, is );
+//set some of the attributes of the embedded file
+ef.setSubtype( "test/plain" );
+ef.setSize( data.length );
+ef.setCreationDate( new GregorianCalendar() );
+fs.setEmbeddedFile( ef );
+
+//now add the entry to the embedded file tree and set in the document.
+Map efMap = new HashMap();
+efMap.put( "My first attachment", fs );
+efTree.setNames( efMap );
+//attachments are stored as part of the "names" dictionary in the document catalog
+PDDocumentNameDictionary names = new PDDocumentNameDictionary( doc.getDocumentCatalog() );
+names.setEmbeddedFiles( efTree );
+doc.getDocumentCatalog().setNames( names );
+~~~
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/workingwithattachments.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/workingwithattachments.mdtext b/content/1.8/cookbook/workingwithattachments.mdtext
deleted file mode 100644
index 9b77301..0000000
--- a/content/1.8/cookbook/workingwithattachments.mdtext
+++ /dev/null
@@ -1,54 +0,0 @@
----
-layout: default
-title: Cookbook - Working with Attachments
----
-
-# Working with Attachments
-
-## The PDF File Specification
-
-See package:org.apache.pdfbox.pdmodel.common.filespecification  
-See example:EmbeddedFiles  
-
-A PDF can contain references to external files via the file system or a URL to a remote 
-location. It is also possible to embed a binary file into a PDF document.
-
-There are two classes that can be used when referencing a file. ``PDSimpleFileSpecification``
-is a simple string reference to a file(e.g. "./movies/BigMovie.avi"). The simple file 
-specification does not allow for any parameters to be set. 
-
-The ``PDComplexFileSpecification`` is more feature rich and allows for advanced settings on 
-the file reference.
-
-It is also possible to embed a file directly into a PDF. Instead of setting the file 
-attribute of the ``PDComplexFileSpecification``, the ``EmbeddedFile`` attribute can be used instead.
-
-## Adding a File Attachment
-
-PDF documents can contain file attachments that are accessed from the Document->File Attachments 
-menu. PDFBox allows attachments to be added to and extracted from PDF documents. 
-Attachments are part of the named tree that is attached to the document catalog.
-
-~~~java
-PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
-
-//first create the file specification, which holds the embedded file
-PDComplexFileSpecification fs = new PDComplexFileSpecification();
-fs.setFile( "Test.txt" );
-InputStream is = ...;
-PDEmbeddedFile ef = new PDEmbeddedFile(doc, is );
-//set some of the attributes of the embedded file
-ef.setSubtype( "test/plain" );
-ef.setSize( data.length );
-ef.setCreationDate( new GregorianCalendar() );
-fs.setEmbeddedFile( ef );
-
-//now add the entry to the embedded file tree and set in the document.
-Map efMap = new HashMap();
-efMap.put( "My first attachment", fs );
-efTree.setNames( efMap );
-//attachments are stored as part of the "names" dictionary in the document catalog
-PDDocumentNameDictionary names = new PDDocumentNameDictionary( doc.getDocumentCatalog() );
-names.setEmbeddedFiles( efTree );
-doc.getDocumentCatalog().setNames( names );
-~~~
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/workingwithfonts.md
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/workingwithfonts.md b/content/1.8/cookbook/workingwithfonts.md
new file mode 100644
index 0000000..a3d6165
--- /dev/null
+++ b/content/1.8/cookbook/workingwithfonts.md
@@ -0,0 +1,129 @@
+---
+layout: default
+title:  Cookbook - Working with Fonts
+---
+
+# Working with Fonts
+
+## Standard 14 Fonts
+
+The PDF specification states that a standard set of 14 fonts will always be available when consuming PDF documents. In PDFBox these are defined as constants in the PDType1Font class.
+
+| Standard Font | Description |
+| ------------- | ----------- |
+| PDType1Font.TIMES_ROMAN | Times regular |
+| PDType1Font.TIMES_BOLD | Times bold |
+| PDType1Font.TIMES_ITALIC | Times italic |
+| PDType1Font.TIMES_BOLD_ITALIC | Times bold italic |
+| PDType1Font.HELVETICA | Helvetica regular |
+| PDType1Font.HELVETICA_BOLD | Helvetica bold |
+| PDType1Font.HELVETICA_OBLIQUE | Helvetica italic |
+| PDType1Font.HELVETICA_BOLD_OBLIQUE | Helvetica bold italic | 
+| PDType1Font.COURIER | Courier |
+| PDType1Font.COURIER_BOLD | Courier bold |
+| PDType1Font.COURIER_OBLIQUE | Courier italic |
+| PDType1Font.COURIER_BOLD_OBLIQUE | Courier bold italic |
+| PDType1Font.SYMBOL | Symbol Set |
+| PDType1Font.ZAPF_DINGBATS | Dingbat Typeface |
+
+## Hello World using a PDF base font
+
+This small sample shows how to create a new document and print the text "Hello World" using one of the PDF base fonts.
+
+~~~java
+// Create a document and add a page to it
+PDDocument document = new PDDocument();
+PDPage page = new PDPage();
+document.addPage( page );
+
+// Create a new font object selecting one of the PDF base fonts
+PDFont font = PDType1Font.HELVETICA_BOLD;
+
+// Start a new content stream which will "hold" the to be created content
+PDPageContentStream contentStream = new PDPageContentStream(document, page);
+
+// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
+contentStream.beginText();
+contentStream.setFont( font, 12 );
+contentStream.moveTextPositionByAmount( 100, 700 );
+contentStream.drawString( "Hello World" );
+contentStream.endText();
+
+// Make sure that the content stream is closed:
+contentStream.close();
+
+// Save the results and ensure that the document is properly closed:
+document.save( "Hello World.pdf");
+document.close();
+~~~
+
+## Hello World using a TrueType font
+
+This small sample shows how to create a new document and print the text "Hello World" using a TrueType font.
+
+~~~java
+// Create a document and add a page to it
+PDDocument document = new PDDocument();
+PDPage page = new PDPage();
+document.addPage( page );
+
+// Create a new font object by loading a TrueType font into the document
+PDFont font = PDTrueTypeFont.loadTTF(document, "Arial.ttf");
+
+// Start a new content stream which will "hold" the to be created content
+PDPageContentStream contentStream = new PDPageContentStream(document, page);
+
+// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
+contentStream.beginText();
+contentStream.setFont( font, 12 );
+contentStream.moveTextPositionByAmount( 100, 700 );
+contentStream.drawString( "Hello World" );
+contentStream.endText();
+
+// Make sure that the content stream is closed:
+contentStream.close();
+
+// Save the results and ensure that the document is properly closed:
+document.save( "Hello World.pdf");
+document.close();
+~~~
+
+While it is recommended to embed all fonts for greatest portability not all PDF producer 
+applications will do this. When displaying a PDF it is necessary to find an external font to use. 
+PDFBox will look for a mapping file to use when substituting fonts.
+
+PDFBox will load Resources/PDFBox_External_Fonts.properties off of the classpath to map font
+names to TTF font files. The UNKNOWN_FONT property in that file will tell PDFBox which font to 
+use when no mapping exists. 
+
+
+## Hello World using a Postscript Type1 font
+
+This small sample shows how to create a new document and print the text "Hello World" using a Postscript Type1 font.
+
+~~~java
+// Create a document and add a page to it
+PDDocument document = new PDDocument();
+PDPage page = new PDPage();
+document.addPage( page );
+
+// Create a new font object by loading a Postscript Type 1 font into the document
+PDFont font = new PDType1AfmPfbFont(doc,"cfm.afm");
+
+// Start a new content stream which will "hold" the to be created content
+PDPageContentStream contentStream = new PDPageContentStream(document, page);
+
+// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
+contentStream.beginText();
+contentStream.setFont( font, 12 );
+contentStream.moveTextPositionByAmount( 100, 700 );
+contentStream.drawString( "Hello World" );
+contentStream.endText();
+
+// Make sure that the content stream is closed:
+contentStream.close();
+
+// Save the results and ensure that the document is properly closed:
+document.save( "Hello World.pdf");
+document.close();
+~~~
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/workingwithfonts.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/workingwithfonts.mdtext b/content/1.8/cookbook/workingwithfonts.mdtext
deleted file mode 100644
index a3d6165..0000000
--- a/content/1.8/cookbook/workingwithfonts.mdtext
+++ /dev/null
@@ -1,129 +0,0 @@
----
-layout: default
-title:  Cookbook - Working with Fonts
----
-
-# Working with Fonts
-
-## Standard 14 Fonts
-
-The PDF specification states that a standard set of 14 fonts will always be available when consuming PDF documents. In PDFBox these are defined as constants in the PDType1Font class.
-
-| Standard Font | Description |
-| ------------- | ----------- |
-| PDType1Font.TIMES_ROMAN | Times regular |
-| PDType1Font.TIMES_BOLD | Times bold |
-| PDType1Font.TIMES_ITALIC | Times italic |
-| PDType1Font.TIMES_BOLD_ITALIC | Times bold italic |
-| PDType1Font.HELVETICA | Helvetica regular |
-| PDType1Font.HELVETICA_BOLD | Helvetica bold |
-| PDType1Font.HELVETICA_OBLIQUE | Helvetica italic |
-| PDType1Font.HELVETICA_BOLD_OBLIQUE | Helvetica bold italic | 
-| PDType1Font.COURIER | Courier |
-| PDType1Font.COURIER_BOLD | Courier bold |
-| PDType1Font.COURIER_OBLIQUE | Courier italic |
-| PDType1Font.COURIER_BOLD_OBLIQUE | Courier bold italic |
-| PDType1Font.SYMBOL | Symbol Set |
-| PDType1Font.ZAPF_DINGBATS | Dingbat Typeface |
-
-## Hello World using a PDF base font
-
-This small sample shows how to create a new document and print the text "Hello World" using one of the PDF base fonts.
-
-~~~java
-// Create a document and add a page to it
-PDDocument document = new PDDocument();
-PDPage page = new PDPage();
-document.addPage( page );
-
-// Create a new font object selecting one of the PDF base fonts
-PDFont font = PDType1Font.HELVETICA_BOLD;
-
-// Start a new content stream which will "hold" the to be created content
-PDPageContentStream contentStream = new PDPageContentStream(document, page);
-
-// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
-contentStream.beginText();
-contentStream.setFont( font, 12 );
-contentStream.moveTextPositionByAmount( 100, 700 );
-contentStream.drawString( "Hello World" );
-contentStream.endText();
-
-// Make sure that the content stream is closed:
-contentStream.close();
-
-// Save the results and ensure that the document is properly closed:
-document.save( "Hello World.pdf");
-document.close();
-~~~
-
-## Hello World using a TrueType font
-
-This small sample shows how to create a new document and print the text "Hello World" using a TrueType font.
-
-~~~java
-// Create a document and add a page to it
-PDDocument document = new PDDocument();
-PDPage page = new PDPage();
-document.addPage( page );
-
-// Create a new font object by loading a TrueType font into the document
-PDFont font = PDTrueTypeFont.loadTTF(document, "Arial.ttf");
-
-// Start a new content stream which will "hold" the to be created content
-PDPageContentStream contentStream = new PDPageContentStream(document, page);
-
-// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
-contentStream.beginText();
-contentStream.setFont( font, 12 );
-contentStream.moveTextPositionByAmount( 100, 700 );
-contentStream.drawString( "Hello World" );
-contentStream.endText();
-
-// Make sure that the content stream is closed:
-contentStream.close();
-
-// Save the results and ensure that the document is properly closed:
-document.save( "Hello World.pdf");
-document.close();
-~~~
-
-While it is recommended to embed all fonts for greatest portability not all PDF producer 
-applications will do this. When displaying a PDF it is necessary to find an external font to use. 
-PDFBox will look for a mapping file to use when substituting fonts.
-
-PDFBox will load Resources/PDFBox_External_Fonts.properties off of the classpath to map font
-names to TTF font files. The UNKNOWN_FONT property in that file will tell PDFBox which font to 
-use when no mapping exists. 
-
-
-## Hello World using a Postscript Type1 font
-
-This small sample shows how to create a new document and print the text "Hello World" using a Postscript Type1 font.
-
-~~~java
-// Create a document and add a page to it
-PDDocument document = new PDDocument();
-PDPage page = new PDPage();
-document.addPage( page );
-
-// Create a new font object by loading a Postscript Type 1 font into the document
-PDFont font = new PDType1AfmPfbFont(doc,"cfm.afm");
-
-// Start a new content stream which will "hold" the to be created content
-PDPageContentStream contentStream = new PDPageContentStream(document, page);
-
-// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
-contentStream.beginText();
-contentStream.setFont( font, 12 );
-contentStream.moveTextPositionByAmount( 100, 700 );
-contentStream.drawString( "Hello World" );
-contentStream.endText();
-
-// Make sure that the content stream is closed:
-contentStream.close();
-
-// Save the results and ensure that the document is properly closed:
-document.save( "Hello World.pdf");
-document.close();
-~~~
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/workingwithmetadata.md
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/workingwithmetadata.md b/content/1.8/cookbook/workingwithmetadata.md
new file mode 100644
index 0000000..83ca51f
--- /dev/null
+++ b/content/1.8/cookbook/workingwithmetadata.md
@@ -0,0 +1,66 @@
+---
+layout: default
+title: Cookbook - Working with Metadata
+---
+
+# Working with Metadata
+
+## Introduction
+
+PDF documents can contain information describing the document itself or certain objects 
+within the document such as the author of the document or it's creation date. 
+Basic information can be set and retrieved using the PDDocumentInformation object.
+
+In addition to that more metadata can be retrieved using the XML metadata as decribed below.
+Getting basic Metadata
+
+To set or retrieve basic information about the document the PDDocumentInformation object 
+provides a high level API to that information:
+
+~~~java
+PDDocumentInformation info = document.getDocumentInformation();
+System.out.println( "Page Count=" + document.getNumberOfPages() );
+System.out.println( "Title=" + info.getTitle() );
+System.out.println( "Author=" + info.getAuthor() );
+System.out.println( "Subject=" + info.getSubject() );
+System.out.println( "Keywords=" + info.getKeywords() );
+System.out.println( "Creator=" + info.getCreator() );
+System.out.println( "Producer=" + info.getProducer() );
+System.out.println( "Creation Date=" + info.getCreationDate() );
+System.out.println( "Modification Date=" + info.getModificationDate());
+System.out.println( "Trapped=" + info.getTrapped() );      
+~~~
+
+## Accessing PDF Metadata
+
+See class:org.apache.pdfbox.pdmodel.common.PDMetadata  
+See example:AddMetadataFromDocInfo  
+See Adobe Documentation:XMP Specification  
+
+PDF documents can have XML metadata associated with certain objects within a PDF document.
+For example, the following PD Model objects have the ability to contain metadata:
+
+    PDDocumentCatalog
+    PDPage
+    PDXObject
+    PDICCBased
+    PDStream
+
+The metadata that is stored in PDF objects conforms to the XMP specification, it is 
+recommended that you review that specification. Currently there is no high level API for 
+managing the XML metadata, PDFBox uses standard java InputStream/OutputStream to retrieve 
+or set the XML metadata.
+
+~~~java
+PDDocument doc = PDDocument.load( ... );
+PDDocumentCatalog catalog = doc.getDocumentCatalog();
+PDMetadata metadata = catalog.getMetadata();
+
+//to read the XML metadata
+InputStream xmlInputStream = metadata.createInputStream();
+
+//or to write new XML metadata
+InputStream newXMPData = ...;
+PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
+catalog.setMetadata( newMetadata );
+~~~
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/c68c6530/content/1.8/cookbook/workingwithmetadata.mdtext
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/workingwithmetadata.mdtext b/content/1.8/cookbook/workingwithmetadata.mdtext
deleted file mode 100644
index 83ca51f..0000000
--- a/content/1.8/cookbook/workingwithmetadata.mdtext
+++ /dev/null
@@ -1,66 +0,0 @@
----
-layout: default
-title: Cookbook - Working with Metadata
----
-
-# Working with Metadata
-
-## Introduction
-
-PDF documents can contain information describing the document itself or certain objects 
-within the document such as the author of the document or it's creation date. 
-Basic information can be set and retrieved using the PDDocumentInformation object.
-
-In addition to that more metadata can be retrieved using the XML metadata as decribed below.
-Getting basic Metadata
-
-To set or retrieve basic information about the document the PDDocumentInformation object 
-provides a high level API to that information:
-
-~~~java
-PDDocumentInformation info = document.getDocumentInformation();
-System.out.println( "Page Count=" + document.getNumberOfPages() );
-System.out.println( "Title=" + info.getTitle() );
-System.out.println( "Author=" + info.getAuthor() );
-System.out.println( "Subject=" + info.getSubject() );
-System.out.println( "Keywords=" + info.getKeywords() );
-System.out.println( "Creator=" + info.getCreator() );
-System.out.println( "Producer=" + info.getProducer() );
-System.out.println( "Creation Date=" + info.getCreationDate() );
-System.out.println( "Modification Date=" + info.getModificationDate());
-System.out.println( "Trapped=" + info.getTrapped() );      
-~~~
-
-## Accessing PDF Metadata
-
-See class:org.apache.pdfbox.pdmodel.common.PDMetadata  
-See example:AddMetadataFromDocInfo  
-See Adobe Documentation:XMP Specification  
-
-PDF documents can have XML metadata associated with certain objects within a PDF document.
-For example, the following PD Model objects have the ability to contain metadata:
-
-    PDDocumentCatalog
-    PDPage
-    PDXObject
-    PDICCBased
-    PDStream
-
-The metadata that is stored in PDF objects conforms to the XMP specification, it is 
-recommended that you review that specification. Currently there is no high level API for 
-managing the XML metadata, PDFBox uses standard java InputStream/OutputStream to retrieve 
-or set the XML metadata.
-
-~~~java
-PDDocument doc = PDDocument.load( ... );
-PDDocumentCatalog catalog = doc.getDocumentCatalog();
-PDMetadata metadata = catalog.getMetadata();
-
-//to read the XML metadata
-InputStream xmlInputStream = metadata.createInputStream();
-
-//or to write new XML metadata
-InputStream newXMPData = ...;
-PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
-catalog.setMetadata( newMetadata );
-~~~
\ No newline at end of file