You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "sun pengrui (JIRA)" <ji...@apache.org> on 2017/07/27 04:05:00 UTC

[jira] [Created] (PDFBOX-3879) Not able to get font styles, like italic and Strikethrough

sun pengrui created PDFBOX-3879:
-----------------------------------

             Summary: Not able to get font styles, like italic and Strikethrough
                 Key: PDFBOX-3879
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3879
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 2.0.7
            Reporter: sun pengrui
         Attachments: src.pdf

I'm trying to extract text from a PDF file, and save it to a XML file. 
The PDF file includes italic and strikethrough font, I cannot get it with PDFont class.
Below is the result.

{code:xml}
<document>
  <page width="595.000000" height="842.000000">
    <line>
      <word x="48.000000" y="89.000000" width="59.843376" height="20.234375" font="LucidaGrande" font-size="28" color="#000000">Title</word>
    </line>
    <line>
      <word x="48.000000" y="139.000000" width="32.190125" height="10.562654" font="LucidaGrande" font-size="14" color="#000000">Italic</word>
    </line>
    <line>
      <word x="48.000000" y="175.000000" width="31.480873" height="10.117188" font="LucidaGrande-Bold" font-size="14" color="#000000">Bold</word>
      <word x="84.171875" y="175.000000" width="26.590248" height="10.117188" font="LucidaGrande-Bold" font-size="14" color="#000000">and</word>
      <word x="115.453125" y="175.000000" width="39.458496" height="10.562654" font="LucidaGrande-Bold" font-size="14" color="#000000">Italic.</word>
    </line>
    <line>
      <word x="48.000000" y="211.000000" width="31.480873" height="10.117188" font="LucidaGrande-Bold" font-size="14" color="#000000">Bold</word>
    </line>
    <line>
      <word x="48.000000" y="247.000000" width="92.764618" height="10.117188" font="LucidaGrande" font-size="14" color="#000000">Strikethrough</word>
    </line>
    <line>
      <word x="48.000000" y="283.000000" width="36.523254" height="10.117188" font="LucidaGrande" font-size="14" color="#000000">some</word>
      <word x="89.000000" y="283.000000" width="26.803375" height="10.117188" font="LucidaGrande" font-size="14" color="#000000">text</word>
    </line>
    <line>
      <word x="48.000000" y="319.000000" width="27.180374" height="10.117188" font="LucidaGrande" font-size="14" color="#000000">new</word>
      <word x="79.687500" y="319.000000" width="24.523247" height="10.117188" font="LucidaGrande" font-size="14" color="#000000">line</word>
      <word x="108.687500" y="319.000000" width="25.350250" height="10.117188" font="LucidaGrande" font-size="14" color="#000000">test</word>
    </line>
  </page>
</document>
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org