You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Alexander Chow (Created) (JIRA)" <ji...@apache.org> on 2012/01/27 14:11:39 UTC

[jira] [Created] (TIKA-851) M4V magic detection invalid

M4V magic detection invalid
---------------------------

                 Key: TIKA-851
                 URL: https://issues.apache.org/jira/browse/TIKA-851
             Project: Tika
          Issue Type: Bug
          Components: mime
    Affects Versions: 1.0
            Reporter: Alexander Chow


When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.

Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:

{code:title=TikaTest.java}
public class TikaTest {

	public static void main(String[] args) throws Exception {
		String userHome = System.getProperty("user.home");

		File file = new File(userHome + "/Desktop/sample_iPod.m4v");

		InputStream is = TikaInputStream.get(file);

		Detector detector = new DefaultDetector(
			MimeTypes.getDefaultMimeTypes());

		Metadata metadata = new Metadata();

		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());

		System.out.println("File + filename: " + detector.detect(is, metadata));

		System.out.println("File only:       " + detector.detect(is, new Metadata()));

		System.out.println("Filename only:   " + detector.detect(null, metadata));
	}

}
{code}

Renders the output:
{code}
File + filename: video/quicktime
File only:       video/quicktime
Filename only:   video/x-m4v
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195587#comment-13195587 ] 

Nick Burch commented on TIKA-851:
---------------------------------

I've added the m4b extension to audio/mp4 in r1237113.
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194891#comment-13194891 ] 

Nick Burch commented on TIKA-851:
---------------------------------

I've added the audio/x-m4a alias in r1236734.
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-851) M4V and M4A detection invalid

Posted by "Alexander Chow (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Chow updated TIKA-851:
--------------------------------

    Attachment: TIKA-851.patch

I've added a patch file that I think should fix the problem for both M4V and M4A.

According to [AtomParsley|http://atomicparsley.sourceforge.net/mpeg-4files.html], "The ftyp atom is ALWAYS first."  This seems to corroborate with Apple's [spec|http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-SW1] discussion on "The Movie Profile Atom".
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194854#comment-13194854 ] 

Nick Burch commented on TIKA-851:
---------------------------------

>From http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFChap1/qtff1.html#//apple_ref/doc/uid/TP40000939-CH203-BBCGDDDF
"Generally speaking, atoms can be present in any order. Do not conclude that a particular atom is not present until you have parsed all the atoms in the file.

An exception is the file type atom, which typically identifies the file as a QuickTime movie. If present, this atom precedes any movie atom, movie data, preview, or free space atoms. If you encounter one of these other atom types prior to finding a file type atom, you may assume the file type atom is not present. (This atom is introduced in the QuickTime File Format Specification for 2004, and is not present in QuickTime movie files created prior to 2004)."

So, if there is a ftyp atom, it should be first, and if the first atom isn't a ftyp then there isn't one. The AtomParsely link is handy, that should help with producing a metadata extracting parser
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (TIKA-851) M4V and M4A detection invalid

Posted by "Alexander Chow (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195027#comment-13195027 ] 

Alexander Chow edited comment on TIKA-851 at 1/27/12 7:11 PM:
--------------------------------------------------------------

Nick, although you add the ftyp for M4B (the bookmarkable format), you don't take into account its extension .m4b.  Do you think you can add that?
                
      was (Author: achow):
    Nick, although you add the ftyp for M4B (the bookmarked format), you don't take into account its extension .m4b.  Do you think you can add that?
                  
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

Posted by "Alexander Chow (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194829#comment-13194829 ] 

Alexander Chow commented on TIKA-851:
-------------------------------------

Sorry Nick, I didn't notice you update the SVN.  It looks like you need to change your mime type though from audio/x-mp4a to audio/x-m4a.
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194813#comment-13194813 ] 

Nick Burch commented on TIKA-851:
---------------------------------

It looks like most files (not sure if it's all of them though) have a ftyp atom at byte 4. This has "ftyp" followed by a 4 byte (space padded if needed) string of the main type. There's a list of the common ones at http://www.ftyps.com/

I've added more specific matches for the common types in r1236700. Using the tika-app jar, I can now correctly detect mp4 video, Apple m4v video, mp4 audio and old quicktime movs (using the lower priority fallback)

I'm not sure if the ftyp atom has to be first or not, if it isn't then this detection won't work. Longer term, a proper file format aware detector would be best, ideally one that can also understand the rest of the format to report on different streams etc
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

Posted by "Alexander Chow (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195616#comment-13195616 ] 

Alexander Chow commented on TIKA-851:
-------------------------------------

Thanks!
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

Posted by "Alexander Chow (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195001#comment-13195001 ] 

Alexander Chow commented on TIKA-851:
-------------------------------------

Thanks Nick for adding the alias.
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-851) M4V and M4A detection invalid

Posted by "Alexander Chow (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Chow updated TIKA-851:
--------------------------------

    Description: 
When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.

Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:

{code:title=TikaTest.java}
public class TikaTest {

	public static void main(String[] args) throws Exception {
		String userHome = System.getProperty("user.home");

		File file = new File(userHome + "/Desktop/sample_iPod.m4v");

		InputStream is = TikaInputStream.get(file);

		Detector detector = new DefaultDetector(
			MimeTypes.getDefaultMimeTypes());

		Metadata metadata = new Metadata();

		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());

		System.out.println("File + filename: " + detector.detect(is, metadata));

		System.out.println("File only:       " + detector.detect(is, new Metadata()));

		System.out.println("Filename only:   " + detector.detect(null, metadata));
	}

}
{code}

Renders the output:
{code}
File + filename: video/quicktime
File only:       video/quicktime
Filename only:   video/x-m4v
{code}

Moreover, if the same test is run against an M4A file, the results are even more incorrect:
{code}
File + filename: video/quicktime
File only:       video/quicktime
Filename only:   application/octet-stream
{code}

  was:
When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.

Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:

{code:title=TikaTest.java}
public class TikaTest {

	public static void main(String[] args) throws Exception {
		String userHome = System.getProperty("user.home");

		File file = new File(userHome + "/Desktop/sample_iPod.m4v");

		InputStream is = TikaInputStream.get(file);

		Detector detector = new DefaultDetector(
			MimeTypes.getDefaultMimeTypes());

		Metadata metadata = new Metadata();

		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());

		System.out.println("File + filename: " + detector.detect(is, metadata));

		System.out.println("File only:       " + detector.detect(is, new Metadata()));

		System.out.println("Filename only:   " + detector.detect(null, metadata));
	}

}
{code}

Renders the output:
{code}
File + filename: video/quicktime
File only:       video/quicktime
Filename only:   video/x-m4v
{code}

        Summary: M4V and M4A detection invalid  (was: M4V magic detection invalid)
    
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-851) M4V and M4A detection invalid

Posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-851.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
    
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

Posted by "Alexander Chow (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195027#comment-13195027 ] 

Alexander Chow commented on TIKA-851:
-------------------------------------

Nick, although you add the ftyp for M4B (the bookmarked format), you don't take into account its extension .m4b.  Do you think you can add that?
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-851) M4V magic detection invalid

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194790#comment-13194790 ] 

Nick Burch commented on TIKA-851:
---------------------------------

I'm not sure if we're going to be able to differentiate between .mov, .mp4 and .m4v with only mime magic, as I believe they all use the same container format

We may need to look at a detector that opens the files up and checks them in a container aware manner
                
> M4V magic detection invalid
> ---------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.  When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira