You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ray Gauss II (JIRA)" <ji...@apache.org> on 2012/11/10 00:05:13 UTC
[jira] [Resolved] (TIKA-1022) DWG Custom properties not extracted
[ https://issues.apache.org/jira/browse/TIKA-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ray Gauss II resolved TIKA-1022.
--------------------------------
Resolution: Fixed
Fix Version/s: 1.3
Resolved in r1407683.
> DWG Custom properties not extracted
> -----------------------------------
>
> Key: TIKA-1022
> URL: https://issues.apache.org/jira/browse/TIKA-1022
> Project: Tika
> Issue Type: Bug
> Components: metadata
> Affects Versions: 1.0, 1.1, 1.2, 1.3
> Reporter: Paolo Nacci
> Assignee: Ray Gauss II
> Labels: patch
> Fix For: 1.3
>
> Attachments: quick2010-tika-no-custom.dwg
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Based on some code I provided some time ago (Alfresco forum), Derek Hulley opened ALF-2262, Nick Burch opened TIKA-413 issue and code has been committed to TIKA (0.8).
> With sample dwg provided TIKA (0.8 to 1.2) is correctly working but with attached file returns no custom metadata (my original "C" returns correct custom metadata, dwg is "2010" format).
> Tested tika-app.1.0.jar and tika-app.1.2.jar and tika 1.3 snapshot.
> All versions could be impacted by this bug.
> I found failing code in skipToCustomProperties() of DWGParser.java, lines 320-321:
> if(padding[0] == 0 && padding[1] == 0 &&
> padding[2] == 0 && padding[3] == 0) {
> padding[0] byte is not always 0 (attached file has 0x2) and probably there is no need to check those bytes.
> Index: DWGParser.java
> ===================================================================
> --- DWGParser.java (revisione 1407024)
> +++ DWGParser.java (copia locale)
> @@ -93,7 +93,7 @@
> * How far to skip after the last standard property, before
> * we find any custom properties that might be there.
> */
> - private static final int CUSTOM_PROPERTIES_SKIP = 20;
> + private static final int CUSTOM_PROPERTIES_SKIP = 24;
>
> public void parse(
> InputStream stream, ContentHandler handler,
> @@ -317,13 +317,7 @@
>
> private int skipToCustomProperties(InputStream stream)
> throws IOException, TikaException {
> - // There should be 4 zero bytes next
> - byte[] padding = new byte[4];
> - IOUtils.readFully(stream, padding);
> - if(padding[0] == 0 && padding[1] == 0 &&
> - padding[2] == 0 && padding[3] == 0) {
> - // Looks hopeful, skip on
> - padding = new byte[CUSTOM_PROPERTIES_SKIP];
> + byte[] padding = new byte[CUSTOM_PROPERTIES_SKIP];
> IOUtils.readFully(stream, padding);
>
> // We should now have the count
> @@ -337,10 +331,6 @@
> // No properties / count is too high to trust
> return 0;
> }
> - } else {
> - // No padding. That probably means no custom props
> - return 0;
> - }
> }
>
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira