You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Peter May (Created) (JIRA)" <ji...@apache.org> on 2012/03/12 11:59:38 UTC

[jira] [Created] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Identify FITS (Flexible Image Transport System) files
-----------------------------------------------------

                 Key: TIKA-874
                 URL: https://issues.apache.org/jira/browse/TIKA-874
             Project: Tika
          Issue Type: Improvement
          Components: mime
    Affects Versions: 1.1, 1.2
            Reporter: Peter May
            Priority: Minor


Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.

This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-874:
-----------------------------------

    Affects Version/s:     (was: 1.2)
                           (was: 1.1)
        Fix Version/s: 1.2

- update fix version, no affects version since new feature.
                
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Peter May
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Chris A. Mattmann (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann resolved TIKA-874.
------------------------------------

    Resolution: Fixed

- patch applied in r1299703. Thank you Peter!
                
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Peter May
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Rahul Khanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509455#comment-13509455 ] 

Rahul Khanna edited comment on TIKA-874 at 12/4/12 3:13 AM:
------------------------------------------------------------

I've created a parser for FITS files that extracts metadata using the nom.tam.fits library available at http://heasarc.gsfc.nasa.gov/docs/heasarc/fits/java/v1.0/ . The code is used in The Australian National University's Data Commons project available at https://github.com/anu-doi/anudc . Code for the parser can be viewed at https://github.com/anu-doi/anudc/blob/master/DcShared/src/main/java/au/edu/anu/dcbag/metadata/FitsParser.java .

I was wondering if the Apache Tika Project is accepting contributions in the form of parsers created by other users such as myself. If yes, how can I submit the code?
                
      was (Author: rahul_k):
    I've created a parser for FITS files that extracts metadata using the nom.tam.fits library available at http://heasarc.gsfc.nasa.gov/docs/heasarc/fits/java/v1.0/ . The code is used in The Australian National University's Data Commons project available at https://github.com/anu-doi/anudc . Code for the parser can be viewed at https://github.com/anu-doi/anudc/blob/master/DcShared/src/main/java/au/edu/anu/dcbag/metadata/FitsParser.java .
                  
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Peter May
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Peter May (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227437#comment-13227437 ] 

Peter May edited comment on TIKA-874 at 3/12/12 1:12 PM:
---------------------------------------------------------

This patch identifies FITS files, based on the signature used by the file(1) command and also specified in RFC4047 (http://fits.gsfc.nasa.gov/rfc4047.txt).

It includes a simple unit test (added to TestMimeTypes) using a FITS file created using imageMagick to convert https://github.com/apache/tika/blob/trunk/tika-parsers/src/test/resources/test-documents/testJPEG.jpg to a FITS image.

Comments welcome.
                
      was (Author: pete.s.may):
    This patch identifies FITS files, based on the signature used by the file(1) command and also specified in RFC4047 (http://fits.gsfc.nasa.gov/rfc4047.txt).

It includes a simple unit test (added to TestMimeTypes) using a FITS file created using imageMagick to convert https://github.com/apache/tika/blob/trunk/tika-parsers/src/test/resources/test-documents/testJPEG.jpg to a FITS image.

Comments welcome on the best approach forward.
                  
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.1, 1.2
>            Reporter: Peter May
>            Priority: Minor
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Peter May (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter May updated TIKA-874:
---------------------------

    Attachment:     (was: fits_support.patch)
    
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.1, 1.2
>            Reporter: Peter May
>            Priority: Minor
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Chris A. Mattmann (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann reassigned TIKA-874:
--------------------------------------

    Assignee: Chris A. Mattmann
    
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.1, 1.2
>            Reporter: Peter May
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Rahul Khanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509455#comment-13509455 ] 

Rahul Khanna commented on TIKA-874:
-----------------------------------

I've created a parser for FITS files that extracts metadata using the nom.tam.fits library available at http://heasarc.gsfc.nasa.gov/docs/heasarc/fits/java/v1.0/ . The code is used in The Australian National University's Data Commons project available at https://github.com/anu-doi/anudc . Code for the parser can be viewed at https://github.com/anu-doi/anudc/blob/master/DcShared/src/main/java/au/edu/anu/dcbag/metadata/FitsParser.java .
                
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Peter May
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Peter May (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter May updated TIKA-874:
---------------------------

    Attachment: fits_support.patch

This patch identifies FITS files, based on the signature used by the file(1) command and also specified in RFC4047 (http://fits.gsfc.nasa.gov/rfc4047.txt).

It includes a simple unit test (added to TestMimeTypes) using a FITS file available at http://fits.gsfc.nasa.gov/nrao_data/tests/nost_headers/good.fits 

I cannot see any noticeable license restrictions on these sample FITS files; there is a contact email address shown on http://fits.gsfc.nasa.gov/fits_nraodata.html where we might be able to get further advice.

Advice/comments welcome on the best approach forward, especially regarding the unit test sample file.
                
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.1, 1.2
>            Reporter: Peter May
>            Priority: Minor
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509691#comment-13509691 ] 

Nick Burch commented on TIKA-874:
---------------------------------

We do, where appropriate. (Sometimes it's better for the parser to live in the same project as the library it depends on, and just include both as dependencies in Tika)

Your best bet is probably to start a thread on dev@tika.apache.org, and we can all work out between us if we're best off bringing the code into Tika or if it's best to leave it outside and simply depend on it.
                
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Peter May
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Peter May (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227437#comment-13227437 ] 

Peter May edited comment on TIKA-874 at 3/12/12 1:11 PM:
---------------------------------------------------------

This patch identifies FITS files, based on the signature used by the file(1) command and also specified in RFC4047 (http://fits.gsfc.nasa.gov/rfc4047.txt).

It includes a simple unit test (added to TestMimeTypes) using a FITS file created using imageMagick to convert https://github.com/apache/tika/blob/trunk/tika-parsers/src/test/resources/test-documents/testJPEG.jpg to a FITS image.

Comments welcome on the best approach forward.
                
      was (Author: pete.s.may):
    This patch identifies FITS files, based on the signature used by the file(1) command and also specified in RFC4047 (http://fits.gsfc.nasa.gov/rfc4047.txt).

It includes a simple unit test (added to TestMimeTypes) using a FITS file available at http://fits.gsfc.nasa.gov/nrao_data/tests/nost_headers/good.fits 

I cannot see any noticeable license restrictions on these sample FITS files; there is a contact email address shown on http://fits.gsfc.nasa.gov/fits_nraodata.html where we might be able to get further advice.

Advice/comments welcome on the best approach forward, especially regarding the unit test sample file.
                  
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.1, 1.2
>            Reporter: Peter May
>            Priority: Minor
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510631#comment-13510631 ] 

Chris A. Mattmann commented on TIKA-874:
----------------------------------------

Hi Rahul: I would recommend creating a new issue here on the TIKA JIRA and then as Nick mentioned, move the discussion of this new parser to dev@tika, and then reference the JIRA issue.

I for one am happy to help shepherd the parser into Tika if it makes sense, and to help you earn the merit to help shepherd it yourself :) 


                
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Peter May
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TIKA-874) Identify FITS (Flexible Image Transport System) files

Posted by "Peter May (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter May updated TIKA-874:
---------------------------

    Attachment: fits_support.patch
    
> Identify FITS (Flexible Image Transport System) files
> -----------------------------------------------------
>
>                 Key: TIKA-874
>                 URL: https://issues.apache.org/jira/browse/TIKA-874
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.1, 1.2
>            Reporter: Peter May
>            Priority: Minor
>         Attachments: fits_support.patch
>
>
> Tika does not have a defined signature for application/fits files.  I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test.
> This patch only handles identification, not parsing of FITS files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira