You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Pramod Pradhan <pr...@gmail.com> on 2009/10/27 01:18:50 UTC
java.io.IOException: expected='startxref'
Hi All,
I am trying to write a simple to code to just parse the text data from a pdf
file onto the console.I am hitting the below exception
java.io.IOException: expected='startxref' actual=''
org.pdfbox.io.PushBackInputStream@100ab23
at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:355)
at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176)
at PDFTextParser.pdftoText(PDFTextParser.java:49)
at PDFTextParser.main(PDFTextParser.java:93)
PDF to Text Conversion failed.
Can someone please help? I have attached the Java class file.
--
thanks,
Pramod Pradhan
(361)228-3989
Re: Paradox with Eclipse and PDFStripper.processPages
Posted by Shen Wang <fe...@gmail.com>.
And by the way, I am still not quite clear how to make Eclipse compile
the code.
Felix
Andreas Lehmkühler wrote:
> Hi,
>
> Shen Wang schrieb:
>
>> Hi guys,
>>
>> I got a weird thing that I don't know how to make it work. Here is the
>> code:
>>
>> import java.io.File;
>> import java.io.IOException;
>> import java.util.List;
>>
>> import org.apache.pdfbox.pdmodel.PDDocument;
>> import org.apache.pdfbox.util.PDFText2HTML;
>> import org.apache.pdfbox.util.PDFTextStripper;
>>
>>
>> public class PDF_Title {
>> public PDF_Title() {
>> }
>> public static void main( String[] args ) throws IOException {
>> if ( args.length != 1 ) {
>> System.out.println( "bad input" );
>> }
>> String pdfFileName = args[ 0 ];
>> PDDocument document = PDDocument.load( pdfFileName );
>> PDFTextStripper stripper = null;
>> stripper = new PDFText2HTML("UTF-8");
>> List pages = document.getDocumentCatalog().getAllPages();
>> stripper.processPages(pages);
>> }
>> }
>>
>> The problem is in the last line, if I leave the parameter of
>> processPages and blank, Eclipse will remind me that a pages list
>> parameter is needed and asks me to fill in. However, when I fill the
>> blank with the parameter, which is "pages" here, Eclipse will tell me
>> that the method of processPages from the type PDFTextStripper is not
>> visible and still refuses to compile. However, according to the javadoc,
>> processPages is simply a method of PDFTextStripper and asks for a page
>> list parameter. Could you guys help me point out where I made the
>> mistake? Thanks.
>>
> Try to use stripper.writeText(document, outputStream) instead of
> stripper.processPages(..)
>
> BR
> Andreas Lehmkühler
>
>
Re: Paradox with Eclipse and PDFStripper.processPages
Posted by Shen Wang <fe...@gmail.com>.
Hi Andreas,
Thanks for your reply. But what I am looking for is further processing
the format information of the document instead of simply extracting the
text. So basicly what I am trying to do is let the stripper object know
which document it's processing when the writeText method is not called.
Do you have any idea about this?
Best,
Felix
Andreas Lehmkühler wrote:
> Hi,
>
> Shen Wang schrieb:
>
>> Hi guys,
>>
>> I got a weird thing that I don't know how to make it work. Here is the
>> code:
>>
>> import java.io.File;
>> import java.io.IOException;
>> import java.util.List;
>>
>> import org.apache.pdfbox.pdmodel.PDDocument;
>> import org.apache.pdfbox.util.PDFText2HTML;
>> import org.apache.pdfbox.util.PDFTextStripper;
>>
>>
>> public class PDF_Title {
>> public PDF_Title() {
>> }
>> public static void main( String[] args ) throws IOException {
>> if ( args.length != 1 ) {
>> System.out.println( "bad input" );
>> }
>> String pdfFileName = args[ 0 ];
>> PDDocument document = PDDocument.load( pdfFileName );
>> PDFTextStripper stripper = null;
>> stripper = new PDFText2HTML("UTF-8");
>> List pages = document.getDocumentCatalog().getAllPages();
>> stripper.processPages(pages);
>> }
>> }
>>
>> The problem is in the last line, if I leave the parameter of
>> processPages and blank, Eclipse will remind me that a pages list
>> parameter is needed and asks me to fill in. However, when I fill the
>> blank with the parameter, which is "pages" here, Eclipse will tell me
>> that the method of processPages from the type PDFTextStripper is not
>> visible and still refuses to compile. However, according to the javadoc,
>> processPages is simply a method of PDFTextStripper and asks for a page
>> list parameter. Could you guys help me point out where I made the
>> mistake? Thanks.
>>
> Try to use stripper.writeText(document, outputStream) instead of
> stripper.processPages(..)
>
> BR
> Andreas Lehmkühler
>
>
Re: Paradox with Eclipse and PDFStripper.processPages
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
Shen Wang schrieb:
> Hi guys,
>
> I got a weird thing that I don't know how to make it work. Here is the
> code:
>
> import java.io.File;
> import java.io.IOException;
> import java.util.List;
>
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.util.PDFText2HTML;
> import org.apache.pdfbox.util.PDFTextStripper;
>
>
> public class PDF_Title {
> public PDF_Title() {
> }
> public static void main( String[] args ) throws IOException {
> if ( args.length != 1 ) {
> System.out.println( "bad input" );
> }
> String pdfFileName = args[ 0 ];
> PDDocument document = PDDocument.load( pdfFileName );
> PDFTextStripper stripper = null;
> stripper = new PDFText2HTML("UTF-8");
> List pages = document.getDocumentCatalog().getAllPages();
> stripper.processPages(pages);
> }
> }
>
> The problem is in the last line, if I leave the parameter of
> processPages and blank, Eclipse will remind me that a pages list
> parameter is needed and asks me to fill in. However, when I fill the
> blank with the parameter, which is "pages" here, Eclipse will tell me
> that the method of processPages from the type PDFTextStripper is not
> visible and still refuses to compile. However, according to the javadoc,
> processPages is simply a method of PDFTextStripper and asks for a page
> list parameter. Could you guys help me point out where I made the
> mistake? Thanks.
Try to use stripper.writeText(document, outputStream) instead of
stripper.processPages(..)
BR
Andreas Lehmkühler
Paradox with Eclipse and PDFStripper.processPages
Posted by Shen Wang <fe...@gmail.com>.
Hi guys,
I got a weird thing that I don't know how to make it work. Here is the code:
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFText2HTML;
import org.apache.pdfbox.util.PDFTextStripper;
public class PDF_Title {
public PDF_Title() {
}
public static void main( String[] args ) throws IOException {
if ( args.length != 1 ) {
System.out.println( "bad input" );
}
String pdfFileName = args[ 0 ];
PDDocument document = PDDocument.load( pdfFileName );
PDFTextStripper stripper = null;
stripper = new PDFText2HTML("UTF-8");
List pages = document.getDocumentCatalog().getAllPages();
stripper.processPages(pages);
}
}
The problem is in the last line, if I leave the parameter of
processPages and blank, Eclipse will remind me that a pages list
parameter is needed and asks me to fill in. However, when I fill the
blank with the parameter, which is "pages" here, Eclipse will tell me
that the method of processPages from the type PDFTextStripper is not
visible and still refuses to compile. However, according to the javadoc,
processPages is simply a method of PDFTextStripper and asks for a page
list parameter. Could you guys help me point out where I made the
mistake? Thanks.
Best,
Felix