You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by zhangkun <zh...@huawei.com> on 2011/05/06 09:01:34 UTC

How to Extract a Table from the PDF file

Dear Sir

 

I am trying to extract a table from the PDF file. I could get the text from
the PDF file. However, when there is some blank in the table, there will be
some trouble in reading the table.

 

For example, there are two tables in the PDF file. 

Table A:

       Col1   Col2   Col3   Col4

Row1  100    200    300   400

Row2  17            89    985

Row3         98     134   

 

Table B:

       Col1   Col2   Col3   Col4

Row1  100    200    300   400

Row2         17     89    985

Row3                98    134

 

In the text extracted from PDF file, Both Table A and Table B would be:

Col1 Col2 Col3 Col4

Row1 100 200 300 400

Row2 17 89 985

Row3 98 134

 

I could not distinguish between Table A and Table B. Please give me a help.

 

Best Regards

 

Zhangkun

2011/5/6