You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Sumit Mohan Jha <su...@yahoo.com.INVALID> on 2016/01/29 19:41:14 UTC

Fwd: Further questions related to PDFBOX-3216 Issue - Part - 1




Resending it as Original email failed due to size restriction. ____________________________________________________________________ Hi, My this email is related to my further questions related to solution provided by Tilman for PDFBOX-3216 issue: https://issues.apache.org/jira/browse/PDFBOX-3216# The solution provided by Tilman is really good. The solution is working for some of PDF I am trying to process but was facing some minor issues due to my requirement with some of other PDF processing: My goal here is to read PDF (multiple PDF with different original formatting and page size) and in each page of individual PDF leave some fixed absolute blank space in top right corner so that later some other application can put a bar code in that blank space. Also, that some fixed absolute blank space left in top portion of PDF pages need to be uniform across all the different source PDF file processed, so that other application which is putting bar code will put same size bar code at same location of different PDFs. I understand that if some PDF will already have some more blank space in top compared to other PDF then converted PDF for first will have more blank space in top compared to other. That is fine. But how to ensure that after conversion both PDF will leave at least some fixed absolute height area blank. If one is more than that absolute fixed height it is ok. Now, to keep uniform blank space in all PDF processed, I had come up with a sample code. Please see attached .java file for my sample code. I have used Tilman sample code provided to come up with my sample code.   Please see attached some of my example source pdf files which I need to process through PDFBOX 1.8.10 version and leave blank space at top in all these PDFs. The challenge here now is to keep uniform some fixed absolute height area blank in top of these 2 PDFs which are of different source formatting. 1. My code work fine with attached First.pdf. 2. But when I am trying to process Second.pdf, if I scale to 95% height only the converted PDF is good. But with 90% converted PDF the content is truncated from bottom. For even less than 90% scaling (like 80%) the converted pdf has blank page. If I scale in width anything less than 100% the converted PDF is messed up. In the current code the converted PDF is messing up due to width scaling. For height it is leaving space in top. But it does not seems to be ensuring that blank space in both converted PDF is at least some fixed absolute height area. In second one it is quite less. What i need is a solution so that my converted pdfs after scaling in height for both these source pdfs will leave at least some fixed absolute height area blank. As I explained earlier if one is leaving more than that fix absolute height is ok as source file already had more blank space at top. If a little width scaling can be done too it will be great as some source pdf which has text strating from at the very left margin will also look good after conversion. But top priority is vertical scaling to leave blank space in top. 3. Also, with my current code execution, i get below warning message, please let me know how this warning message can be avoided: Jan 28, 2016 5:49:29 PM org.apache.pdfbox.pdmodel.edit.PDPageContentStream <init>WARNING: You are overwriting an existing content, you should use the append mode Your any help in this regard will be highly appreciated and I am looking forward for your response on this. Thanks,Sumit Jha

Re: Fwd: Further questions related to PDFBOX-3216 Issue - Part - 1

Posted by Tilman Hausherr <TH...@t-online.de>.
Here's the modified code, I have also upload it on 
http://pastebin.com/uDHdQD9a

I don't think it is a good idea to have different X and Y scales. This 
will look weird if you have photographs of objects whose shape are 
known, e.g. people.


             pd = PDDocument.load(input);

             pdAllPages = pd.getDocumentCatalog().getAllPages();

             //For proof of concept currently only converting first page.
             //But once it works for first page correctly will loop 
through all pages of PDF.
             PDPage page = (PDPage) pdAllPages.get(0);

             InputStream is = page.getContents().createInputStream();
             ByteArrayOutputStream baos = new ByteArrayOutputStream();
             IOUtils.copy(is, baos);
             IOUtils.closeQuietly(is);

             /*
              * PDDocument pdNew = new PDDocument();
              * PDPage pageNew = new PDPage();
              * pdNew.addPage(pageNew);
              */
             PDRectangle cropBox = page.findCropBox();

             //The goal here is to keep a blank space of 50 pixels on top.
             //So that for all PDFs processed through this code will 
leave same blank space at top.
             //In below code doing scaling of width too but in case it 
needs to be avoided can avoid.
             float scaleX = (cropBox.getUpperRightX() - 20) / 
cropBox.getUpperRightX();
             float scaleY = (cropBox.getUpperRightY() - 50) / 
cropBox.getUpperRightY();

             System.out.println("cropBox: " + cropBox);
             PDRectangle newCropBox = new 
PDRectangle(cropBox.getCOSArray()); // clone
             newCropBox.setLowerLeftX(cropBox.getLowerLeftX() * scaleX);
             newCropBox.setLowerLeftY(cropBox.getLowerLeftY() * scaleY);
             // don't do this, as it takes away the space
             //newCropBox.setUpperRightX(cropBox.getUpperRightX() * scaleX);
             //newCropBox.setUpperRightY(cropBox.getUpperRightY() * scaleY);
             System.out.println("newCropBox: " + newCropBox);
             page.setCropBox(newCropBox);

             // appendContent = false, compress = true
             PDPageContentStream pageContentStream = new 
PDPageContentStream(pd,
                     page, false, true);
             pageContentStream.saveGraphicsState();
             System.out.println("scaleX: " + scaleX);
             System.out.println("scaleY: " + scaleY);

             // rectangle path
pageContentStream.appendRawCommands(String.format(Locale.US, "%f %f %f 
%f re\n",
                     newCropBox.getLowerLeftX(),
                     newCropBox.getLowerLeftY(),
                     cropBox.getUpperRightX() * scaleX - 
newCropBox.getLowerLeftX(),
                     cropBox.getUpperRightY() * scaleY - 
newCropBox.getLowerLeftY()));

             // this is just a color fill test
             //pageContentStream.appendRawCommands("0 1 1 rg f\n");

             // modify clipping path; path no-op.
             pageContentStream.appendRawCommands("W n\n");

             pageContentStream.appendRawCommands(scaleX + " 0 0 " + 
scaleY + " 0 0 cm\n");
             pageContentStream.appendRawCommands("% start existing 
stuff\n");
             pageContentStream.appendRawCommands(baos.toByteArray());
             pageContentStream.appendRawCommands("\n");
             pageContentStream.restoreGraphicsState();
             //Commented out below code as currently
             //trying to work on leaving fix height blank
             //space on top for bar code
             /*
              * BufferedImage bim = ImageIO.read(new 
File("C:\\workspaceRAD85_PDFBox_POC\\images.png"));
              * PDXObjectImage img = new PDPixelMap(pd, bim);
              * float x = cropBox.getUpperRightX() - bim.getWidth()-50;
              * float y = cropBox.getUpperRightY() - bim.getHeight();
              * pageContentStream.drawImage(img, x, y);
              */


             pageContentStream.close();

             IOUtils.closeQuietly(baos);


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Fwd: Further questions related to PDFBOX-3216 Issue - Part - 1

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 29.01.2016 um 19:41 schrieb Sumit Mohan Jha:
>
>
>
>
> Resending it as Original email failed due to size restriction.
>
> ____________________________________________________________________
>
> Hi,
>
> My this email is related to my further questions related to solution 
> provided by Tilman for PDFBOX-3216issue:
>
> https://issues.apache.org/jira/browse/PDFBOX-3216# 
> <https://issues.apache.org/jira/browse/PDFBOX-3216>
>
> The solution provided by Tilman is really good. The solution is 
> working for some of PDF I am trying to process but was facing some 
> minor issues due to my requirementwith some of otherPDF processing:
>
> My goal here is to read PDF (multiple PDF with different original 
> formatting and page size) and in each page of individual PDF leave 
> some fixedabsoluteblank space in top right corner so that later some 
> other application can put a bar code in that blank space. Also, that 
> some fixedabsoluteblank space left in top portion of PDF pages need to 
> be uniform across all the different source PDF file processed, so that 
> other application which is putting bar code will put same size bar 
> code at same location of different PDFs.I understand that if some PDF 
> will already have some more blank space in top compared to other PDF 
> then converted PDF for first will have more blank space in top 
> compared to other. That is fine. But how to ensure that after 
> conversion both PDF will leave at least some fixed absolute height 
> area blank. If one is more than that absolute fixed height it is ok.
>
> Now, to keep uniform blank space in all PDF processed, I had come up 
> with a sample code. Please see attached .java file for my sample 
> code.I have used Tilman sample code provided to come up with my sample 
> code.
>
> Please see attached some of my example source pdf files which I need 
> to process through PDFBOX 1.8.10 version and leave blank space at top 
> in all these PDFs. The challenge here now is to keep uniform some 
> fixed absolute height area blank in top of these 2 PDFs which are of 
> different sourceformatting.
>
> 1. My code work fine with attached First.pdf.
>
> 2. But when I am trying to process Second.pdf, if I scale to 95% 
> height only the converted PDF is good. But with 90% converted PDF the 
> content is truncatedfrom bottom. For even less than 90% scaling (like 
> 80%) the converted pdf has blank page.If I scale in width anything 
> less than 100% the converted PDF is messed up.In the current code the 
> converted PDF is messing up due to width scaling. For height it is 
> leaving space in top. But it does not seems to be ensuring that blank 
> space in both converted PDF is at least some fixed absolute height 
> area.In second one it is quite less.
>
> What i need is a solution so that my converted pdfs after scaling in 
> height for both these source pdfs will leave at least some fixed 
> absolute height area blank.As I explained earlier if one is leaving 
> more than that fix absolute height is ok as source file already had 
> more blank space at top.If a little width scaling can be done too it 
> will be great as some source pdf which has text strating from at the 
> very left margin will also look good after conversion.But top priority 
> is vertical scaling to leave blank space in top.
>

I understand. I expected problems with rotated files, but yours 
(http://www.megafileupload.com/ac8r/Second.pdf ) is even trickier:
The PDF page is really a huge page, of which only a rectangle "window" 
is shown, with the help of the cropbox. Because of that, the real (0,0) 
position is outside of your view, so the "seen at bottom left position" 
is no longer there after the scale, it is slightly outside of the "window".

With ordinary files, the (0,0) position is exactly the bottom left, so 
all is scaled relatively to that one.

So I need to do some thinking first. Maybe the cropbox must be adjusted 
as well.

Coincidentally, the product in the PDF might help (but I have never used 
it, nor do I have access to any of it, LOL).

> 3. Also, with my current code execution, i get below warning message, 
> please let me know how this warning message can be avoided:
>
>
> Jan 28, 2016 5:49:29 PM 
> org.apache.pdfbox.pdmodel.edit.PDPageContentStream <init>
>
> WARNING: You are overwriting an existing content, you should use the 
> append mode
>

Don't bother about this one. You are indeed overwriting. You can switch 
off the message with a log4j setting.

log4j.logger.org.apache.pdfbox.pdmodel.edit.PDPageContentStream=ERROR


Tilman

> Your any help in this regard will be highly appreciated and I am 
> looking forward for your response on this.
>
> Thanks,
>
> Sumit Jha
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org