You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2012/03/29 12:30:28 UTC

[jira] [Commented] (TIKA-887) Tika fails to parse some MP3 tags correctly and produces null characters in value

    [ https://issues.apache.org/jira/browse/TIKA-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241134#comment-13241134 ] 

Nick Burch commented on TIKA-887:
---------------------------------

Is the problem still present in Tika 1.1? Only there were some mp3 tag related fixes between 1.0 and 1.1 that may have solved this already
                
> Tika fails to parse some MP3 tags correctly and produces null characters in value
> ---------------------------------------------------------------------------------
>
>                 Key: TIKA-887
>                 URL: https://issues.apache.org/jira/browse/TIKA-887
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Jens Hübel
>            Priority: Minor
>
> I have a problem when extracting the comment tag from an MP3 file. It contains an invalid prefix then a '\0' character and then the real value of the tag. This happpens with files downloaded from www.jamendo.com, for example this one:
> http://storage.newjamendo.com/download/track/450545/mp32/Swansong.mp3
> It may be that the tags are not created properly on this site, but at least tools like mp3tag display them correctly.
> The extracted value looks like this: eng http://www.jamendo.com Attribution-Noncommercial-Share Alike 3.0
> At position 3 there is a null character. The tag value should start with http...
> Here is the byte sequence at the beginning of this file:
> 49 44 33 04 00 00 00 01 18 32 54 49 54 32 00 00 
> 00 09 00 00 03 53 77 61 6E 73 6F 6E 67 54 50 45 
> 31 00 00 00 0E 00 00 03 4A 6F 73 68 20 57 6F 6F 
> 64 77 61 72 64 54 41 4C 42 00 00 00 0C 00 00 03 
> 42 72 65 61 64 63 72 75 6D 62 73 54 44 52 4C 00 
> 00 00 05 00 00 03 32 30 30 39 43 4F 4D 4D 00 00 
> 00 22 00 00 03 65 6E 67 49 44 33 20 76 31 20 43 
> 6F 6D 6D 65 6E 74 00 41 74 74 72 69 62 75 74 69 
> 6F 6E 20 33 2E 30 54 43 4F 4E 00 00 00 06 00 00 
> 03 28 32 35 35 29 54 50 55 42 00 00 00 08 00 00 
> 03 4A 61 6D 65 6E 64 6F 43 4F 4D 4D 00 00 00 2C 
> 00 00 03 65 6E 67 00 68 74 74 70 3A 2F 2F 77 77 
> 77 2E 6A 61 6D 65 6E 64 6F 2E 63 6F 6D 20 41 74 
> 74 72 69 62 75 74 69 6F 6E 20 33 2E 30 20 54 43 
> 4F 50 00 00 01 1F 00 00 03 32 30 30 39 2D 31 30 
> 2D 32 31 54 31 31 3A 31 31 3A 32 30 2B 30 31 3A 
> 30 30 20 4A 6F 73 68 20 57 6F 6F 64 77 61 72 64 
> 2E 20 4C 69 63 65 6E 73 65 64 20 74 6F 20 74 68
> ID3......2TIT2.......SwansongTPE1.......Josh WoodwardTALB.......BreadcrumbsTDRL.......2009COMM..."...engID3 v1 Comment.Attribution 3.0TCON.......(255)TPUB.......JamendoCOMM...,...eng.http://www.jamendo.com Attribution 3.0 TCOP.......2009-10-21T11:11:20+01:00 Josh Woodward. Licensed to th

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira