You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2012/01/05 03:40:39 UTC
[jira] [Commented] (TIKA-695) Custom properties on xlsx, docx, pptx
[ https://issues.apache.org/jira/browse/TIKA-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180140#comment-13180140 ]
Nick Burch commented on TIKA-695:
---------------------------------
Would it be possible for you to create some sample files containing custom properties? We can then use that as part of a unit test when we add the functionality in
> Custom properties on xlsx, docx, pptx
> -------------------------------------
>
> Key: TIKA-695
> URL: https://issues.apache.org/jira/browse/TIKA-695
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.10, 1.0
> Environment: All OS
> Reporter: Etienne Jouvin
> Priority: Minor
>
> Parser on office Xfiles do not get custom properties.
> In class MetadataExtractor, method extract, only core and extended properties are retrieve.
> I added something like this:
> extractMetadata(extractor.getCustomProperties(), metadata);
> {quote}
> /**
> * Add this method to read custom properties on document.
> *
> * @param properties All custom properties.
> * @param metadata Metadata to complete with read properties.
> */
> private void extractMetadata(CustomProperties properties, Metadata metadata) {
> org.openxmlformats.schemas.officeDocument.x2006.customProperties.CTProperties propsHolder = properties.getUnderlyingProperties();
> String value = null;
> DateUtils dateUtils = DateUtils.getInstance();
> BigDecimal bigDecimal;
> for (CTProperty property : propsHolder.getPropertyList()) {
> /* Parse each property */
> if (property.isSetLpwstr()) {
> value = property.getLpwstr();
> } else if (property.isSetFiletime()) {
> value = dateUtils.convertDate(property.getFiletime(), null);
> } else if (property.isSetDate()) {
> value = dateUtils.convertDate(property.getDate(), null);
> } else if (property.isSetDecimal()) {
> bigDecimal = property.getDecimal();
> value = null == bigDecimal ? null : bigDecimal.toString();
> } else if (property.isSetBool()) {
> value = BooleanUtils.toStringTrueFalse(property.getBool());
> } else if (property.isSetInt()) {
> value = Integer.toString(property.getInt());
> } else if (property.isSetLpstr()) {
> value = property.getLpstr();
> } else if (property.isSetI4()) {
> /* Number in Excel for example.... Why i4 ? Ask microsoft. */
> value = Integer.toString(property.getI4());
> } else {
> /* For other type, do nothing. */
> continue;
> }
> /* Add the custom prefix, as done in old office format. */
> addProperty(metadata, "custom:" + property.getName(), value);
> }
> }
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira