You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Ganesh <em...@yahoo.co.in> on 2010/12/03 06:26:38 UTC
PDF text extracted without spaces
Hello all,
I newbie with Tika. I am using latest version 0.8 version. I extracted text from PDF document but found spaces and new line missing. Indexing the data gives wrong result. Could any one in this group could help me?
Regards
Ganesh
Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php
Re: PDF text extracted without spaces
Posted by Ganesh <em...@yahoo.co.in>.
Excatly the same issue. The spaces and newline is not extracted properly.
When could we expect the new release?
Regards
Ganesh
----- Original Message -----
From: "Jukka Zitting" <jz...@adobe.com>
To: <us...@tika.apache.org>
Sent: Sunday, December 05, 2010 5:24 PM
Subject: RE: PDF text extracted without spaces
> Hi,
>
> From: Ganesh [mailto:emailgane@yahoo.co.in]
>> I newbie with Tika. I am using latest version 0.8 version. I extracted
>> text from PDF document but found spaces and new line missing. Indexing
>> the data gives wrong result. Could any one in this group could help me?
>
> That's an unfortunate regression that got included in the 0.8 release. See TIKA-548 [1] for the details.
>
> The problem is fixed in the latest 0.9-SNAPSHOT version, and we probably should cut a new release soon with this fix.
>
> [1] https://issues.apache.org/jira/browse/TIKA-548
>
> BR,
>
> Jukka Zitting
>
Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php
RE: PDF text extracted without spaces
Posted by Jukka Zitting <jz...@adobe.com>.
Hi,
From: Ganesh [mailto:emailgane@yahoo.co.in]
> I newbie with Tika. I am using latest version 0.8 version. I extracted
> text from PDF document but found spaces and new line missing. Indexing
> the data gives wrong result. Could any one in this group could help me?
That's an unfortunate regression that got included in the 0.8 release. See TIKA-548 [1] for the details.
The problem is fixed in the latest 0.9-SNAPSHOT version, and we probably should cut a new release soon with this fix.
[1] https://issues.apache.org/jira/browse/TIKA-548
BR,
Jukka Zitting
Re: PDF text extracted without spaces
Posted by Grant Ingersoll <gs...@apache.org>.
Can you share more about how you are using it. Also, can you show a test case?
-Grant
On Dec 3, 2010, at 12:26 AM, Ganesh wrote:
> Hello all,
>
> I newbie with Tika. I am using latest version 0.8 version. I extracted text from PDF document but found spaces and new line missing. Indexing the data gives wrong result. Could any one in this group could help me?
>
> Regards
> Ganesh
>
> Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php
--------------------------
Grant Ingersoll
http://www.lucidimagination.com