You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/09/01 14:08:00 UTC
[jira] [Created] (TIKA-2458) Unify number of pages metadata key
Tim Allison created TIKA-2458:
---------------------------------
Summary: Unify number of pages metadata key
Key: TIKA-2458
URL: https://issues.apache.org/jira/browse/TIKA-2458
Project: Tika
Issue Type: Improvement
Components: core
Reporter: Tim Allison
Priority: Minor
On TIKA-2451, we're adding a metadata value for the number of images in a tiff. This raises the broader (admittedly minor) question of how we want to handle "number of pages".
I'm opening this issue for discussion and feedback.
Unfortunately Dublin Core doesn't have a {{number of pages}} element as far as a I can tell.
Do we want to have a single key in {{TikaCoreProperties}} that is "number of pages" that would be used for:
# number of pages in a PDF
# number of pages that a .docx alleges it has
# the number of slides in a PPT
# the number of sheets in an XLS
# the number of tiffs in a multi-image tiff
Others?
Or, do we want to have different keys {{MSOffice.PageCount}}, {{PagedText.N_PAGES}}, {{TIFF.NUM_TIFFS}}
Or, thanks to the beauty of composite keys, do we want to have both a unified key and the above individual keys?
*I would propose using PagedText's {{N_PAGES}} as the unifying key, but the definition of that seems to be strictly within XMP-land _and_ it should be a sum of the pages in the container document and all embedded documents according to our javadocs.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)