You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/03/02 04:28:04 UTC

[jira] [Resolved] (TIKA-862) JPSS HDF5 files not being detected appropriately

     [ https://issues.apache.org/jira/browse/TIKA-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tyler Palsulich resolved TIKA-862.
----------------------------------
    Resolution: Fixed

Marking as fixed. The output from the above file is
{code}
<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Mission_Name" content="NPP"/>
<meta name="Content-Length" content="20888"/>
<meta name="Distributor" content="noaa"/>
<meta name="N_HDF_Creation_Date" content="20111122"/>
<meta name="N_HDF_Creation_Time" content="203300.301515Z"/>
<meta name="N_Collection_Short_Name" content="SPACECRAFT-DIARY-RDR"/>
<meta name="Instrument_Short_Name" content="SPACECRAFT"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.hdf.HDFParser"/>
<meta name="Platform_Short_Name" content="NPP"/>
<meta name="N_Dataset_Source" content="noaa"/>
<meta name="N_Dataset_Type_Tag" content="RDR"/>
<meta name="N_Processing_Domain" content="ops"/>
<meta name="Content-Type" content="application/x-hdf"/>
<meta name="resourceName" content="test.h5"/>
<title/>
</head>
<body/></html>
{code}

> JPSS HDF5 files not being detected appropriately
> ------------------------------------------------
>
>                 Key: TIKA-862
>                 URL: https://issues.apache.org/jira/browse/TIKA-862
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Richard Yu
>            Assignee: Chris A. Mattmann
>         Attachments: ASF.LICENSE.NOT.GRANTED--RNSCA-ROLPS_npp_d20120202_t1841338_e1842112_b01382_c20120202203730692328_noaa_ops.h5, ASF.LICENSE.NOT.GRANTED--RNSCA-ROLPS_npp_d20120202_t1841338_e1842112_b01382_c20120202203730692328_noaa_ops.h5, RNSCA_npp_d20111121_t1935200_e1935400_b00346_c20111122203300301515_noaa_ops.h5
>
>
> As commented in TIKA-614, JPSS HDF 5 files are not being properly detected by Tika. See this:
> from [~minfing]:
> {quote}
> We were trying to extract metadata from our h5 file (i.e. with JPSS extension). We ran the following command line:
> {noformat}
> [ryu@localhost hdf5extractor]$ java -jar tika-app-1.0.jar -m \
> > /usr/local/staging/products/h5/SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5
> Content-Encoding: windows-1252
> Content-Length: 22187952
> Content-Type: text/plain
> resourceName: SVM13_npp_d20120122_t1659139_e1700381_b01225_c20120123000312144174_noaa_ops.h5
> [ryu@localhost hdf5extractor]$
> {noformat}
> We noticed that the content type in text/plain and only 4 lines of output (i.e. we expected al lots of metadata).
> Let me know if more information is needed. Thanks!
> Richard
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)