You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Karl Øie <ka...@gan.no> on 2003/01/02 11:45:01 UTC

Re: PDF Text extraction

to get the string value of a inputstream you can use it to fill a 
ByteArrayInputStream and get the content from that;

ByteArrayInputStream bais = new ByteArrayInputStream(inputstream);
System.out.println( new String(bais.getBytes()) );

mvh karl øie

On Friday, Dec 27, 2002, at 07:34 Europe/Oslo, Suhas Indra wrote:

> Hello List
>
> I am using PDFBox to index some of the PDF documents. The parser works 
> fine
> and I can read the summary. But the contents are displayed as
> java.io.InputStream.
>
> When I try the following:
> System.out.println(doc.getField("contents")) (where doc is the Document
> object)
>
> The result will be:
>
> Text<co...@127dc0>
>
> I want to print the extracted data.
>
> Can anyone please let me know how to extract the contents?
>
> Regards
>
> Suhas
>
>
>
> --------------------------------------------------------------
> Robosoft Technologies - Partners in Product Development
>
>
>
>
>
>
>
>
>
> --
> To unsubscribe, e-mail:   
> <ma...@jakarta.apache.org>
> For additional commands, e-mail: 
> <ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>