You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/02/26 14:33:00 UTC
[jira] [Closed] (TIKA-2589) Wrong page count detection (docx from
dotm template)
[ https://issues.apache.org/jira/browse/TIKA-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison closed TIKA-2589.
-----------------------------
Resolution: Not A Problem
Thank you for opening this issue.
MSWord calculates page counts dynamically and IMHO rarely stores the actual page count for a document, rather, it typically stores "1", which is incorrect. If you add .zip to your file, unzip it, and look in docProps/app.xml, you'll see:
{noformat}
<Pages>1</Pages><Words>127171</Words><Characters>724878</Characters>
{noformat}
It is beyond the scope of Tika to calculate page counts dynamically, and so, we rely on whatever MSWord stored in the document.
> Wrong page count detection (docx from dotm template)
> ----------------------------------------------------
>
> Key: TIKA-2589
> URL: https://issues.apache.org/jira/browse/TIKA-2589
> Project: Tika
> Issue Type: Bug
> Components: metadata
> Affects Versions: 1.17
> Environment: $ java -version
> java version "1.8.0_161"
> Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode
> OS Version: 6.1.7601 Service Pack 1 сборка 7601
> Reporter: Leonid Korsakov
> Priority: Major
> Attachments: 262 страницы.docx
>
>
> I have docx file cteated from dotm template. When I call
> {code:java}
> java -jar tika-app.jar -m path_to_file
> {code}
> i see xmpTPg:NPages: 1 but docx file contain 262 pages count
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)