You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Emil Burzo (JIRA)" <ji...@apache.org> on 2012/06/13 17:04:16 UTC

[jira] [Created] (TIKA-939) Windows Media Video file detected as Windows Media Audio

Emil Burzo created TIKA-939:
-------------------------------

             Summary: Windows Media Video file detected as Windows Media Audio
                 Key: TIKA-939
                 URL: https://issues.apache.org/jira/browse/TIKA-939
             Project: Tika
          Issue Type: Bug
          Components: mime
    Affects Versions: 1.1
         Environment: Microsoft's Expression Encoder 4 SP1
            Reporter: Emil Burzo
            Priority: Minor


Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-wmv".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-939) Windows Media Video file detected as Windows Media Audio

Posted by "Emil Burzo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emil Burzo updated TIKA-939:
----------------------------

    Description: 
Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-wmv".

Expected result:
$ java -jar tika-app-1.1.jar -d test.wmv 
video/x-ms-wmv

Actual result:
$ java -jar tika-app-1.1.jar -d test.wmv 
audio/x-ms-wma


  was:Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-wmv".

    
> Windows Media Video file detected as Windows Media Audio
> --------------------------------------------------------
>
>                 Key: TIKA-939
>                 URL: https://issues.apache.org/jira/browse/TIKA-939
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.1
>         Environment: Microsoft's Expression Encoder 4 SP1
>            Reporter: Emil Burzo
>            Priority: Minor
>         Attachments: test.wmv
>
>
> Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-wmv".
> Expected result:
> $ java -jar tika-app-1.1.jar -d test.wmv 
> video/x-ms-wmv
> Actual result:
> $ java -jar tika-app-1.1.jar -d test.wmv 
> audio/x-ms-wma

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-939) Windows Media Video file detected as Windows Media Audio

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-939.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.2
    
> Windows Media Video file detected as Windows Media Audio
> --------------------------------------------------------
>
>                 Key: TIKA-939
>                 URL: https://issues.apache.org/jira/browse/TIKA-939
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.1
>         Environment: Microsoft's Expression Encoder 4 SP1
>            Reporter: Emil Burzo
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: test.wmv
>
>
> Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-wmv".
> Expected result:
> $ java -jar tika-app-1.1.jar -d test.wmv 
> video/x-ms-wmv
> Actual result:
> $ java -jar tika-app-1.1.jar -d test.wmv 
> audio/x-ms-wma

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-939) Windows Media Video file detected as Windows Media Audio

Posted by "Emil Burzo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emil Burzo updated TIKA-939:
----------------------------

    Attachment: test.wmv
    
> Windows Media Video file detected as Windows Media Audio
> --------------------------------------------------------
>
>                 Key: TIKA-939
>                 URL: https://issues.apache.org/jira/browse/TIKA-939
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.1
>         Environment: Microsoft's Expression Encoder 4 SP1
>            Reporter: Emil Burzo
>            Priority: Minor
>         Attachments: test.wmv
>
>
> Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-wmv".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-939) Windows Media Video file detected as Windows Media Audio

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294527#comment-13294527 ] 

Nick Burch commented on TIKA-939:
---------------------------------

WMA and WMV use the same container format (ASF), so detecting them only based on mime magic is tricky to do. Ideally, we want a container aware detector for these kinds of formats, much as we already do for things like ZIP, OLE2 and Ogg.

In the absence of a proper ASF container format aware detector, we have to try to fudge things based on looking for magic strings in a file with the ASF magic bytes at the front. There's alas not an obvious serious of bytes we can look for that conclusively says "this is a video", so for now we just look for the video codec names in the first 8kb. Your file used a different codec, so wasn't found. The audio codec chosen was found, so Tika assumed it was audio.

I've added the string for this in r1349906, so detection now works properly for your file. Longer term, we do need someone to do a proper ASF aware detector.
                
> Windows Media Video file detected as Windows Media Audio
> --------------------------------------------------------
>
>                 Key: TIKA-939
>                 URL: https://issues.apache.org/jira/browse/TIKA-939
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.1
>         Environment: Microsoft's Expression Encoder 4 SP1
>            Reporter: Emil Burzo
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: test.wmv
>
>
> Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-wmv".
> Expected result:
> $ java -jar tika-app-1.1.jar -d test.wmv 
> video/x-ms-wmv
> Actual result:
> $ java -jar tika-app-1.1.jar -d test.wmv 
> audio/x-ms-wma

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira