You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Sincaglia (JIRA)" <ji...@apache.org> on 2018/10/18 22:04:00 UTC

[jira] [Created] (TIKA-2761) XML Structured Text Is Missing Metadata Fields for mp3 files

Nick Sincaglia created TIKA-2761:
------------------------------------

             Summary: XML Structured Text Is Missing Metadata Fields for mp3 files
                 Key: TIKA-2761
                 URL: https://issues.apache.org/jira/browse/TIKA-2761
             Project: Tika
          Issue Type: Bug
          Components: metadata
    Affects Versions: 1.19.1
         Environment: All
            Reporter: Nick Sincaglia


I am using the Tika 1.19 as a GUI to extract metadata from an .mp3 file. The sample rate is available and I am able access it, but only as a string or as part of a JSON document. I am working in XML and wold like to use XML as a content handler. But when the metadata is returned as 'structured text' (XML) the sample rate is not returned. I have tried using Tika 1.19 in a Maven project and experimented with different contentHandlers  and the same issue occurs. I cannot seem to get the sample rate returned in an XML doc, but I am able to access the data from the metadata object itself. If the metadata is returned as a string, the sample rate is there, if it is returned as XML, the sample rate is not returned. I am wondering what I am doing wrong or misunderstanding. Perhaps an issue with the parser or contentHandler that is used?

 

*_+Tika 1.19 'Metadata' view (sample rate is available):+_*

 

Author: Glee Cast

Content-Length: 8251946

Content-Type: audio/mpeg

X-Parsed-By: org.apache.tika.parser.DefaultParser

X-Parsed-By: org.apache.tika.parser.mp3.Mp3Parser

X-TIKA:digest:MD5: e0bdf3a0e171fca838604f9baad46612

X-TIKA:digest:SHA256: ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0

channels: 2

creator: Glee Cast

dc:creator: Glee Cast

dc:title: Rehab (Glee Cast Version)

meta:author: Glee Cast

resourceName: USQX90900223_A4_T7.mp3

*+_samplerate: 44100_+*

title: Rehab (Glee Cast Version)

version: MPEG 3 Layer III Version 1

xmpDM:album: Glee: The Music, The Complete Season One

xmpDM:artist: Glee Cast

xmpDM:audioChannelType: Stereo

xmpDM:audioCompressor: MP3

*_+xmpDM:audioSampleRate: 44100+_*

xmpDM:duration: 206301.296875

xmpDM:genre:

xmpDM:logComment: XXX -

(P) 2009 Twentieth Century Fox Television - USQX90900223

xmpDM:releaseDate:

xmpDM:trackNumber: 4

 

 

*Tika 1.19 'Structured Text' view (no sample rate):*

 

<?xml version="1.0" encoding="UTF-8"?><html xmlns="[http://www.w3.org/1999/xhtml]">

<head>

<meta name="xmpDM:genre" content=""/>

<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>

<meta name="X-Parsed-By" content="org.apache.tika.parser.mp3.Mp3Parser"/>

<meta name="creator" content="Glee Cast"/>

<meta name="xmpDM:album" content="Glee: The Music, The Complete Season One"/>

<meta name="xmpDM:releaseDate" content=""/>

<meta name="meta:author" content="Glee Cast"/>

<meta name="xmpDM:artist" content="Glee Cast"/>

<meta name="X-TIKA:digest:SHA256" content="ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0"/>

<meta name="dc:creator" content="Glee Cast"/>

<meta name="xmpDM:audioCompressor" content="MP3"/>

<meta name="resourceName" content="USQX90900223_A4_T7.mp3"/>

<meta name="xmpDM:logComment" content="XXX - &#10;(P) 2009 Twentieth Century Fox Television - USQX90900223"/>

<meta name="dc:title" content="Rehab (Glee Cast Version)"/>

<meta name="Author" content="Glee Cast"/>

<meta name="Content-Length" content="8251946"/>

<meta name="X-TIKA:digest:MD5" content="e0bdf3a0e171fca838604f9baad46612"/>

<meta name="Content-Type" content="audio/mpeg"/>

<title>Rehab (Glee Cast Version)</title>

</head>

<body><h1>Rehab (Glee Cast Version)</h1>

<p>Glee Cast</p>

<p>Glee: The Music, The Complete Season One, track 4</p>

<p>206301.3</p>

<p>XXX -  (P) 2009 Twentieth Century Fox Television - USQX90900223</p>

</body></html>

 

*_+Tika 1.19 Recursive JSON view (the sample rate is there):+_*

 

[

  {

    "Author": "Glee Cast",

    "Content-Type": "audio/mpeg",

    "X-Parsed-By": [

      "org.apache.tika.parser.DefaultParser",

      "org.apache.tika.parser.mp3.Mp3Parser"

    ],

    "X-TIKA:content": "Rehab (Glee Cast Version)\nGlee Cast\nGlee: The Music, The Complete Season One, track 4\n206301.3\nXXX - \n(P) 2009 Twentieth Century Fox Television - USQX90900223\n",

    "X-TIKA:digest:MD5": "e0bdf3a0e171fca838604f9baad46612",

    "X-TIKA:digest:SHA256": "ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0",

    "X-TIKA:parse_time_millis": "86",

    "channels": "2",

    "creator": "Glee Cast",

    "dc:creator": "Glee Cast",

    "dc:title": "Rehab (Glee Cast Version)",

    "meta:author": "Glee Cast",

    *+_"samplerate": "44100",_+*

    "title": "Rehab (Glee Cast Version)",

    "version": "MPEG 3 Layer III Version 1",

    "xmpDM:album": "Glee: The Music, The Complete Season One",

    "xmpDM:artist": "Glee Cast",

    "xmpDM:audioChannelType": "Stereo",

    "xmpDM:audioCompressor": "MP3",

    *_+"xmpDM:audioSampleRate": "44100",+_*

    "xmpDM:duration": "206301.296875",

    "xmpDM:genre": "",

    "xmpDM:logComment": "XXX - \n(P) 2009 Twentieth Century Fox Television - USQX90900223",

    "xmpDM:releaseDate": "",

    "xmpDM:trackNumber": "4"

  }

]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)