You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by 学而时习之 <29...@qq.com> on 2013/09/24 08:04:57 UTC

What is different from the input pdf file and output pdf file?

package copyfile;


import java.text.SimpleDateFormat;
import java.util.List;
import org.apache.pdfbox.pdfparser.PDFStreamParser;
import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDStream;


public class cpfile {
	static String inputfile = "d:/pdf分析/7.pdf";
	static String outputfile = inputfile.replace(".pdf", "cp.pdf");


	public static void main(String[] args) {
		dealone(inputfile);
	}


	private static void dealone(String f) {
		PDDocument document = null;
		try {
			document = PDDocument.load(f);
			PDDocumentCatalog catalog = document.getDocumentCatalog();
			List<PDPage> pages = catalog.getAllPages();
			for (Object pageObj : catalog.getAllPages()) {
				PDPage page = (PDPage) pageObj;
				PDFStreamParser parser = new PDFStreamParser(page.getContents());
				parser.parse();
				List tokens = parser.getTokens();


				PDStream newContents = new PDStream(document);
				ContentStreamWriter writer = new ContentStreamWriter(
						newContents.createOutputStream());
				writer.writeTokens(tokens);
				newContents.addCompression();
				page.setContents(newContents);
			}
			document.save(outputfile);


		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			try {
				document.close();
			} catch (Exception e) {
				e.printStackTrace();
			}


		}


	}
}

回复: 回复: What is different from the input pdf file and output pdf file?

Posted by daijun <16...@qq.com>.
Dears:
How to copy font information?
d.j.


------------------ 原始邮件 ------------------
发件人: "Maruan Sahyoun";<sa...@fileaffairs.de>;
发送时间: 2013年9月25日(星期三) 下午2:35
收件人: "users"<us...@pdfbox.apache.org>; 

主题: Re: 回复: What is different from the input pdf file and output pdf file?



Hi,

no, the information is only for the content stream of the page. Information such as header information, document metadata, cross reference … are missing.

BR

Maruan

Am 25.09.2013 um 04:44 schrieb "学而时习之" <29...@qq.com>:

> I am trying to confirm whether   tokens all together(tokens got by the code   “List tokens = parser.getTokens();”) contains the entire information of the pdf file.
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "Andreas Lehmkuehler";<an...@lehmi.de>;
> 发送时间: 2013年9月24日(星期二) 晚上11:59
> 收件人: "users"<us...@pdfbox.apache.org>; 
> 
> 主题: Re: What is different from the input pdf file and output pdf file?
> 
> 
> 
> Hi,
> 
> Am 24.09.2013 08:04, schrieb 学而时习之:
>> package copyfile;
>> 
>> 
>> import java.text.SimpleDateFormat;
>> import java.util.List;
>> import org.apache.pdfbox.pdfparser.PDFStreamParser;
>> import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
>> import org.apache.pdfbox.pdmodel.PDDocument;
>> import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
>> import org.apache.pdfbox.pdmodel.PDPage;
>> import org.apache.pdfbox.pdmodel.common.PDStream;
>> 
>> 
>> public class cpfile {
>> 	static String inputfile = "d:/pdf分析/7.pdf";
>> 	static String outputfile = inputfile.replace(".pdf", "cp.pdf");
>> 
>> 
>> 	public static void main(String[] args) {
>> 		dealone(inputfile);
>> 	}
>> 
>> 
>> 	private static void dealone(String f) {
>> 		PDDocument document = null;
>> 		try {
>> 			document = PDDocument.load(f);
>> 			PDDocumentCatalog catalog = document.getDocumentCatalog();
>> 			List<PDPage> pages = catalog.getAllPages();
>> 			for (Object pageObj : catalog.getAllPages()) {
>> 				PDPage page = (PDPage) pageObj;
>> 				PDFStreamParser parser = new PDFStreamParser(page.getContents());
>> 				parser.parse();
>> 				List tokens = parser.getTokens();
>> 
>> 
>> 				PDStream newContents = new PDStream(document);
>> 				ContentStreamWriter writer = new ContentStreamWriter(
>> 						newContents.createOutputStream());
>> 				writer.writeTokens(tokens);
>> 				newContents.addCompression();
>> 				page.setContents(newContents);
>> 			}
>> 			document.save(outputfile);
>> 
>> 
>> 		} catch (Exception e) {
>> 			e.printStackTrace();
>> 		} finally {
>> 			try {
>> 				document.close();
>> 			} catch (Exception e) {
>> 				e.printStackTrace();
>> 			}
>> 
>> 
>> 		}
>> 
>> 
>> 	}
>> }
> What are you trying to do?
> 
> BR
> Andreas Lehmkühler
> 
> .

Re: 回复: What is different from the input pdf file and output pdf file?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

no, the information is only for the content stream of the page. Information such as header information, document metadata, cross reference … are missing.

BR

Maruan

Am 25.09.2013 um 04:44 schrieb "学而时习之" <29...@qq.com>:

> I am trying to confirm whether   tokens all together(tokens got by the code   “List tokens = parser.getTokens();”) contains the entire information of the pdf file.
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "Andreas Lehmkuehler";<an...@lehmi.de>;
> 发送时间: 2013年9月24日(星期二) 晚上11:59
> 收件人: "users"<us...@pdfbox.apache.org>; 
> 
> 主题: Re: What is different from the input pdf file and output pdf file?
> 
> 
> 
> Hi,
> 
> Am 24.09.2013 08:04, schrieb 学而时习之:
>> package copyfile;
>> 
>> 
>> import java.text.SimpleDateFormat;
>> import java.util.List;
>> import org.apache.pdfbox.pdfparser.PDFStreamParser;
>> import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
>> import org.apache.pdfbox.pdmodel.PDDocument;
>> import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
>> import org.apache.pdfbox.pdmodel.PDPage;
>> import org.apache.pdfbox.pdmodel.common.PDStream;
>> 
>> 
>> public class cpfile {
>> 	static String inputfile = "d:/pdf分析/7.pdf";
>> 	static String outputfile = inputfile.replace(".pdf", "cp.pdf");
>> 
>> 
>> 	public static void main(String[] args) {
>> 		dealone(inputfile);
>> 	}
>> 
>> 
>> 	private static void dealone(String f) {
>> 		PDDocument document = null;
>> 		try {
>> 			document = PDDocument.load(f);
>> 			PDDocumentCatalog catalog = document.getDocumentCatalog();
>> 			List<PDPage> pages = catalog.getAllPages();
>> 			for (Object pageObj : catalog.getAllPages()) {
>> 				PDPage page = (PDPage) pageObj;
>> 				PDFStreamParser parser = new PDFStreamParser(page.getContents());
>> 				parser.parse();
>> 				List tokens = parser.getTokens();
>> 
>> 
>> 				PDStream newContents = new PDStream(document);
>> 				ContentStreamWriter writer = new ContentStreamWriter(
>> 						newContents.createOutputStream());
>> 				writer.writeTokens(tokens);
>> 				newContents.addCompression();
>> 				page.setContents(newContents);
>> 			}
>> 			document.save(outputfile);
>> 
>> 
>> 		} catch (Exception e) {
>> 			e.printStackTrace();
>> 		} finally {
>> 			try {
>> 				document.close();
>> 			} catch (Exception e) {
>> 				e.printStackTrace();
>> 			}
>> 
>> 
>> 		}
>> 
>> 
>> 	}
>> }
> What are you trying to do?
> 
> BR
> Andreas Lehmkühler
> 
> .


回复: What is different from the input pdf file and output pdf file?

Posted by 学而时习之 <29...@qq.com>.
I am trying to confirm whether   tokens all together(tokens got by the code   “List tokens = parser.getTokens();”) contains the entire information of the pdf file.




------------------ 原始邮件 ------------------
发件人: "Andreas Lehmkuehler";<an...@lehmi.de>;
发送时间: 2013年9月24日(星期二) 晚上11:59
收件人: "users"<us...@pdfbox.apache.org>; 

主题: Re: What is different from the input pdf file and output pdf file?



Hi,

Am 24.09.2013 08:04, schrieb 学而时习之:
> package copyfile;
>
>
> import java.text.SimpleDateFormat;
> import java.util.List;
> import org.apache.pdfbox.pdfparser.PDFStreamParser;
> import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.pdmodel.common.PDStream;
>
>
> public class cpfile {
> 	static String inputfile = "d:/pdf分析/7.pdf";
> 	static String outputfile = inputfile.replace(".pdf", "cp.pdf");
>
>
> 	public static void main(String[] args) {
> 		dealone(inputfile);
> 	}
>
>
> 	private static void dealone(String f) {
> 		PDDocument document = null;
> 		try {
> 			document = PDDocument.load(f);
> 			PDDocumentCatalog catalog = document.getDocumentCatalog();
> 			List<PDPage> pages = catalog.getAllPages();
> 			for (Object pageObj : catalog.getAllPages()) {
> 				PDPage page = (PDPage) pageObj;
> 				PDFStreamParser parser = new PDFStreamParser(page.getContents());
> 				parser.parse();
> 				List tokens = parser.getTokens();
>
>
> 				PDStream newContents = new PDStream(document);
> 				ContentStreamWriter writer = new ContentStreamWriter(
> 						newContents.createOutputStream());
> 				writer.writeTokens(tokens);
> 				newContents.addCompression();
> 				page.setContents(newContents);
> 			}
> 			document.save(outputfile);
>
>
> 		} catch (Exception e) {
> 			e.printStackTrace();
> 		} finally {
> 			try {
> 				document.close();
> 			} catch (Exception e) {
> 				e.printStackTrace();
> 			}
>
>
> 		}
>
>
> 	}
> }
What are you trying to do?

BR
Andreas Lehmkühler

.

Re: What is different from the input pdf file and output pdf file?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 24.09.2013 08:04, schrieb 学而时习之:
> package copyfile;
>
>
> import java.text.SimpleDateFormat;
> import java.util.List;
> import org.apache.pdfbox.pdfparser.PDFStreamParser;
> import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.pdmodel.common.PDStream;
>
>
> public class cpfile {
> 	static String inputfile = "d:/pdf分析/7.pdf";
> 	static String outputfile = inputfile.replace(".pdf", "cp.pdf");
>
>
> 	public static void main(String[] args) {
> 		dealone(inputfile);
> 	}
>
>
> 	private static void dealone(String f) {
> 		PDDocument document = null;
> 		try {
> 			document = PDDocument.load(f);
> 			PDDocumentCatalog catalog = document.getDocumentCatalog();
> 			List<PDPage> pages = catalog.getAllPages();
> 			for (Object pageObj : catalog.getAllPages()) {
> 				PDPage page = (PDPage) pageObj;
> 				PDFStreamParser parser = new PDFStreamParser(page.getContents());
> 				parser.parse();
> 				List tokens = parser.getTokens();
>
>
> 				PDStream newContents = new PDStream(document);
> 				ContentStreamWriter writer = new ContentStreamWriter(
> 						newContents.createOutputStream());
> 				writer.writeTokens(tokens);
> 				newContents.addCompression();
> 				page.setContents(newContents);
> 			}
> 			document.save(outputfile);
>
>
> 		} catch (Exception e) {
> 			e.printStackTrace();
> 		} finally {
> 			try {
> 				document.close();
> 			} catch (Exception e) {
> 				e.printStackTrace();
> 			}
>
>
> 		}
>
>
> 	}
> }
What are you trying to do?

BR
Andreas Lehmkühler