You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Paolo Nacci (JIRA)" <ji...@apache.org> on 2012/11/09 19:08:12 UTC

[jira] [Updated] (TIKA-1022) DWG Custom properties not extracted

     [ https://issues.apache.org/jira/browse/TIKA-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paolo Nacci updated TIKA-1022:
------------------------------

    Attachment: quick2010-tika-no-custom.dwg

No custom properties extracted from this file (version 2010)
                
> DWG Custom properties not extracted
> -----------------------------------
>
>                 Key: TIKA-1022
>                 URL: https://issues.apache.org/jira/browse/TIKA-1022
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 1.0, 1.1, 1.2, 1.3
>            Reporter: Paolo Nacci
>              Labels: patch
>         Attachments: quick2010-tika-no-custom.dwg
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Based on some code I provided some time ago (Alfresco forum), Derek Hulley opened ALF-2262, Nick Burch opened TIKA-413 issue and code has been committed to TIKA (0.8).
> With sample dwg provided TIKA (0.8 to 1.2) is correctly working but with attached file returns no custom metadata (my original "C" returns correct custom metadata, dwg is "2010" format).
> Tested tika-app.1.0.jar and tika-app.1.2.jar and tika 1.3 snapshot.
> All versions could be impacted by this bug. 
> I found failing code in skipToCustomProperties() of DWGParser.java, lines 320-321: 
> if(padding[0] == 0 && padding[1] == 0 &&
>   padding[2] == 0 && padding[3] == 0) {
> padding[0] byte is not always 0 (attached file has 0x2) and probably there is no need to check those bytes.
> --- T:/Temp/DWGPa-revBASE.svn000.tmp.java	mar lug  3 05:14:39 2012
> +++ C:/Users/paolon/Documents/Tika/tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java	ven nov  9 19:04:26 2012
> @@ -317,13 +317,7 @@
>  
>      private int skipToCustomProperties(InputStream stream) 
>              throws IOException, TikaException {
> -       // There should be 4 zero bytes next
> -       byte[] padding = new byte[4];
> -       IOUtils.readFully(stream, padding);
> -       if(padding[0] == 0 && padding[1] == 0 &&
> -             padding[2] == 0 && padding[3] == 0) {
> -          // Looks hopeful, skip on
> -          padding = new byte[CUSTOM_PROPERTIES_SKIP];
> +          byte[] padding = new byte[CUSTOM_PROPERTIES_SKIP];
>            IOUtils.readFully(stream, padding);
>            
>            // We should now have the count
> @@ -337,10 +331,6 @@
>               // No properties / count is too high to trust
>               return 0;
>            }
> -       } else {
> -          // No padding. That probably means no custom props
> -          return 0;
> -       }
>      }
>  
>  }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira