You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2012/01/24 15:40:40 UTC

[jira] [Created] (TIKA-850) Consistent way to supply document passwords to parsers

Consistent way to supply document passwords to parsers
------------------------------------------------------

                 Key: TIKA-850
                 URL: https://issues.apache.org/jira/browse/TIKA-850
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.0
            Reporter: Nick Burch


Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password

We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192184#comment-13192184 ] 

Nick Burch commented on TIKA-850:
---------------------------------

Does anyone have a feeling for if the password should be being passed in on the Metadata object (as PDF currently supports), or on the ParseContext (as other Parser options are)?
                
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
>                 Key: TIKA-850
>                 URL: https://issues.apache.org/jira/browse/TIKA-850
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196953#comment-13196953 ] 

Nick Burch commented on TIKA-850:
---------------------------------

PasswordProvider added in r1238616, based on the above description. 

The PDFParser has also been updated to use it in preference to the metadata key. Assuming there are no changes suggested in the next few days, I'll roll it out to the POI based parsers too. 
                
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
>                 Key: TIKA-850
>                 URL: https://issues.apache.org/jira/browse/TIKA-850
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209279#comment-13209279 ] 

Nick Burch commented on TIKA-850:
---------------------------------

I've updated OfficeParser in r1244933 to use the same pattern as PDFParser, with PasswordProvider. 

I believe these are the only two that currently support password protected files, so I think this is now finished. (Well, until we add more file formats!)
                
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
>                 Key: TIKA-850
>                 URL: https://issues.apache.org/jira/browse/TIKA-850
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>             Fix For: 1.1
>
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-850) Consistent way to supply document passwords to parsers

Posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-850.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
    
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
>                 Key: TIKA-850
>                 URL: https://issues.apache.org/jira/browse/TIKA-850
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>             Fix For: 1.1
>
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193124#comment-13193124 ] 

Nick Burch commented on TIKA-850:
---------------------------------

Currently, the objects set onto the ParseContext are:
 * Detector.class
 * DocumentSelector.class
 * EmbeddedDocumentExtractor.class
 * Locale.class
 * MimeConfig.class
 * Parser.class

The ones set onto the Metadata for use by parsers are:
 * RESOURCE_NAME_KEY (resourceName)
 * CONTENT_TYPE (Content-Type)
 * PASSWORD (org.apache.pdfbox.tika.password) *PDF Only*
 * TIKA_MIME_FILE (tika.mime.file);
 * MIME_TYPE_MAGIC (mime.type.magic);

                
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
>                 Key: TIKA-850
>                 URL: https://issues.apache.org/jira/browse/TIKA-850
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193131#comment-13193131 ] 

Nick Burch commented on TIKA-850:
---------------------------------

Based on this, I think the best option may be to have a new interface, called something like PasswordProvider, set onto the ParseContext

PasswordProvider would have a single method, 'String getPassword(Metadata)', which would potentially allow you to look up the password based on the resource name and content type. 

We'd probably want a single implementation out of the box, which takes a String on the constructor and always returns that as the password, to make life easy for calling parsing when you know the password for your file

Thoughts? Better names? Alternate ways to do it?
                
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
>                 Key: TIKA-850
>                 URL: https://issues.apache.org/jira/browse/TIKA-850
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira