You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Vimal Kumar <vi...@gmail.com> on 2018/01/12 08:17:30 UTC

How to Fetch Color of text(Word) using PDFbox

Hi ,

I Am tring to find the color code of a word in java using pdfbox 2.0.0
and same code is

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.apache.pdfbox.contentstream.operator.color.SetNonStrokingColor;
import org.apache.pdfbox.contentstream.operator.color.SetNonStrokingColorN;
import org.apache.pdfbox.contentstream.operator.color.SetNonStrokingColorSpace;
import org.apache.pdfbox.contentstream.operator.color.SetNonStrokingDeviceCMYKColor;
import org.apache.pdfbox.contentstream.operator.color.SetNonStrokingDeviceGrayColor;
import org.apache.pdfbox.contentstream.operator.color.SetNonStrokingDeviceRGBColor;
import org.apache.pdfbox.contentstream.operator.color.SetStrokingColor;
import org.apache.pdfbox.contentstream.operator.color.SetStrokingColorN;
import org.apache.pdfbox.contentstream.operator.color.SetStrokingColorSpace;
import org.apache.pdfbox.contentstream.operator.color.SetStrokingDeviceCMYKColor;
import org.apache.pdfbox.contentstream.operator.color.SetStrokingDeviceGrayColor;
import org.apache.pdfbox.contentstream.operator.color.SetStrokingDeviceRGBColor;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.graphics.color.PDColor;
import org.apache.pdfbox.pdmodel.graphics.state.RenderingMode;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;


public class PDF_Box_1 extends PDFTextStripper
{
    /**
     * Instantiate a new PDFTextStripper object.
     *
     * @throws IOException If there is an error loading the properties.
     */
    public PDF_Box_1() throws IOException
    {
        addOperator(new SetStrokingColorSpace());
        addOperator(new SetNonStrokingColorSpace());
        addOperator(new SetStrokingDeviceCMYKColor());
        addOperator(new SetNonStrokingDeviceCMYKColor());
        addOperator(new SetNonStrokingDeviceRGBColor());
        addOperator(new SetStrokingDeviceRGBColor());
        addOperator(new SetNonStrokingDeviceGrayColor());
        addOperator(new SetStrokingDeviceGrayColor());
        addOperator(new SetStrokingColor());
        addOperator(new SetStrokingColorN());
        addOperator(new SetNonStrokingColor());
        addOperator(new SetNonStrokingColorN());
    }

    /**
     * This will print the documents data.
     *
     * @param args The command line arguments.
     *
     * @throws IOException If there is an error parsing the document.
     */
    public static void main(String[] args) throws IOException
    {

            try (PDDocument document = PDDocument.load(new
File("D://Vimal//New folder//b1.pdf")))
            {
                PDFTextStripper stripper = new PDF_Box_1();
                stripper.setSortByPosition(true);
                stripper.setStartPage(0);
                stripper.setEndPage(document.getNumberOfPages());

                stripper.getText(document);
            }

    }

    @Override
    protected void processTextPosition(TextPosition text)
    {
        super.processTextPosition(text);

        PDColor strokingColor = getGraphicsState().getStrokingColor();
        PDColor nonStrokingColor = getGraphicsState().getNonStrokingColor();
        String unicode = text.getUnicode();
        RenderingMode renderingMode =
getGraphicsState().getTextState().getRenderingMode();
        System.out.println("Unicode:            " + unicode);
        System.out.println("Rendering mode:     " + renderingMode);
        System.out.println("Stroking color:     " + strokingColor);
        System.out.println("Non-Stroking color: " + nonStrokingColor);
        System.out.println("Non-Stroking color: " + nonStrokingColor);
        System.out.println();

        // See the PrintTextLocations for more attributes
    }


}


But it is returning me color for a single character , can anyone
please help me to fetch the color of a word from pdf.

Thanks
Vimal


Re: How to Fetch Color of text(Word) using PDFbox

Posted by Gilad Denneboom <gi...@gmail.com>.
Each character exists on its own in a PDF file. It's entirely possible that
one character in a word will have one color and the next will have a
different one.
If you want you can assume that the entire word has the same color and then
use the color of the first character for the entire word, or you can write
a function that compares all the colors of all the characters in a word and
if they are all the same it returns it as the word's color.

On Fri, Jan 12, 2018 at 9:17 AM, Vimal Kumar <vi...@gmail.com> wrote:

> Hi ,
>
> I Am tring to find the color code of a word in java using pdfbox 2.0.0
> and same code is
>
> import java.io.ByteArrayOutputStream;
> import java.io.File;
> import java.io.IOException;
> import java.io.OutputStreamWriter;
> import java.io.Writer;
> import org.apache.pdfbox.contentstream.operator.color.SetNonStrokingColor;
> import org.apache.pdfbox.contentstream.operator.color.
> SetNonStrokingColorN;
> import org.apache.pdfbox.contentstream.operator.color.
> SetNonStrokingColorSpace;
> import org.apache.pdfbox.contentstream.operator.color.
> SetNonStrokingDeviceCMYKColor;
> import org.apache.pdfbox.contentstream.operator.color.
> SetNonStrokingDeviceGrayColor;
> import org.apache.pdfbox.contentstream.operator.color.
> SetNonStrokingDeviceRGBColor;
> import org.apache.pdfbox.contentstream.operator.color.SetStrokingColor;
> import org.apache.pdfbox.contentstream.operator.color.SetStrokingColorN;
> import org.apache.pdfbox.contentstream.operator.color.
> SetStrokingColorSpace;
> import org.apache.pdfbox.contentstream.operator.color.
> SetStrokingDeviceCMYKColor;
> import org.apache.pdfbox.contentstream.operator.color.
> SetStrokingDeviceGrayColor;
> import org.apache.pdfbox.contentstream.operator.color.
> SetStrokingDeviceRGBColor;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.graphics.color.PDColor;
> import org.apache.pdfbox.pdmodel.graphics.state.RenderingMode;
> import org.apache.pdfbox.text.PDFTextStripper;
> import org.apache.pdfbox.text.TextPosition;
>
>
> public class PDF_Box_1 extends PDFTextStripper
> {
>     /**
>      * Instantiate a new PDFTextStripper object.
>      *
>      * @throws IOException If there is an error loading the properties.
>      */
>     public PDF_Box_1() throws IOException
>     {
>         addOperator(new SetStrokingColorSpace());
>         addOperator(new SetNonStrokingColorSpace());
>         addOperator(new SetStrokingDeviceCMYKColor());
>         addOperator(new SetNonStrokingDeviceCMYKColor());
>         addOperator(new SetNonStrokingDeviceRGBColor());
>         addOperator(new SetStrokingDeviceRGBColor());
>         addOperator(new SetNonStrokingDeviceGrayColor());
>         addOperator(new SetStrokingDeviceGrayColor());
>         addOperator(new SetStrokingColor());
>         addOperator(new SetStrokingColorN());
>         addOperator(new SetNonStrokingColor());
>         addOperator(new SetNonStrokingColorN());
>     }
>
>     /**
>      * This will print the documents data.
>      *
>      * @param args The command line arguments.
>      *
>      * @throws IOException If there is an error parsing the document.
>      */
>     public static void main(String[] args) throws IOException
>     {
>
>             try (PDDocument document = PDDocument.load(new
> File("D://Vimal//New folder//b1.pdf")))
>             {
>                 PDFTextStripper stripper = new PDF_Box_1();
>                 stripper.setSortByPosition(true);
>                 stripper.setStartPage(0);
>                 stripper.setEndPage(document.getNumberOfPages());
>
>                 stripper.getText(document);
>             }
>
>     }
>
>     @Override
>     protected void processTextPosition(TextPosition text)
>     {
>         super.processTextPosition(text);
>
>         PDColor strokingColor = getGraphicsState().getStrokingColor();
>         PDColor nonStrokingColor = getGraphicsState().
> getNonStrokingColor();
>         String unicode = text.getUnicode();
>         RenderingMode renderingMode =
> getGraphicsState().getTextState().getRenderingMode();
>         System.out.println("Unicode:            " + unicode);
>         System.out.println("Rendering mode:     " + renderingMode);
>         System.out.println("Stroking color:     " + strokingColor);
>         System.out.println("Non-Stroking color: " + nonStrokingColor);
>         System.out.println("Non-Stroking color: " + nonStrokingColor);
>         System.out.println();
>
>         // See the PrintTextLocations for more attributes
>     }
>
>
> }
>
>
> But it is returning me color for a single character , can anyone
> please help me to fetch the color of a word from pdf.
>
> Thanks
> Vimal
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>

Re: How to Fetch Color of text(Word) using PDFbox

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 12.01.2018 um 09:17 schrieb Vimal Kumar:
> I Am tring to find the color code of a word in java using pdfbox 2.0.0
> and same code is

No easy answer because there is no such thing as a "word". You'd need to 
look into the PDFTextStripper code, understand the algorithm and create 
something quite different... what you see as "words" are either groups 
of glyphs separated by an empty area, or (doesn't happen often, but it 
happens) glyphs having a space among them. See in the source code for 
the word "word".

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org