You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Sara Miller (JIRA)" <ji...@apache.org> on 2017/03/17 11:46:41 UTC

[jira] [Updated] (TIKA-2304) Strange output from PdfParser

     [ https://issues.apache.org/jira/browse/TIKA-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sara Miller updated TIKA-2304:
------------------------------
    Description: 
I get strange output when parsing this pdf:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3977&rep=rep1&type=pdf

with PUT 192.168.1.115:9908/tika and headers: Accept:text/html

An extract of the output: 
"<p>���������	��
&#13;���������������������������� �!"���
</p>
            <p>#$�% ���!"�%&amp;'��(*)+�,!-���
</p>
            <p>
.�� ��/�� 10��������� �!"21�	�434�%54!"�6�
</p>
            <p>7�8:9�;�&lt;&gt;=@?�A�9�BDC
</p>
            <p>E A	FHG�9DI"JLK�M�NLOPJLB�N�J.Q�JLGR8:K-I"FSJLB�I
</p>
            <p>E M	T"U:V@TXW	Y�U Z�NLI"A	[RJLK&#13;\]U	U:V</p>
            <p/>"

  was:
I get strange output when parsing this pdf:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3977&rep=rep1&type=pdf

with PUT 192.168.1.115:9908/tika and headers: Accept:text/html


> Strange output from PdfParser
> -----------------------------
>
>                 Key: TIKA-2304
>                 URL: https://issues.apache.org/jira/browse/TIKA-2304
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.13
>         Environment: org.apache.tika.parser.pdf.PDFParser
>            Reporter: Sara Miller
>            Priority: Minor
>
> I get strange output when parsing this pdf:
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3977&rep=rep1&type=pdf
> with PUT 192.168.1.115:9908/tika and headers: Accept:text/html
> An extract of the output: 
> "<p>���������	��
> &#13;���������������������������� �!"���
> </p>
>             <p>#$�% ���!"�%&amp;'��(*)+�,!-���
> </p>
>             <p>
> .�� ��/�� 10��������� �!"21�	�434�%54!"�6�
> </p>
>             <p>7�8:9�;�&lt;&gt;=@?�A�9�BDC
> </p>
>             <p>E A	FHG�9DI"JLK�M�NLOPJLB�N�J.Q�JLGR8:K-I"FSJLB�I
> </p>
>             <p>E M	T"U:V@TXW	Y�U Z�NLI"A	[RJLK&#13;\]U	U:V</p>
>             <p/>"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)