You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2012/01/24 15:40:40 UTC
[jira] [Created] (TIKA-850) Consistent way to supply document
passwords to parsers
Consistent way to supply document passwords to parsers
------------------------------------------------------
Key: TIKA-850
URL: https://issues.apache.org/jira/browse/TIKA-850
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.0
Reporter: Nick Burch
Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-850) Consistent way to supply document
passwords to parsers
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192184#comment-13192184 ]
Nick Burch commented on TIKA-850:
---------------------------------
Does anyone have a feeling for if the password should be being passed in on the Metadata object (as PDF currently supports), or on the ParseContext (as other Parser options are)?
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
> Key: TIKA-850
> URL: https://issues.apache.org/jira/browse/TIKA-850
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.0
> Reporter: Nick Burch
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-850) Consistent way to supply document
passwords to parsers
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196953#comment-13196953 ]
Nick Burch commented on TIKA-850:
---------------------------------
PasswordProvider added in r1238616, based on the above description.
The PDFParser has also been updated to use it in preference to the metadata key. Assuming there are no changes suggested in the next few days, I'll roll it out to the POI based parsers too.
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
> Key: TIKA-850
> URL: https://issues.apache.org/jira/browse/TIKA-850
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.0
> Reporter: Nick Burch
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-850) Consistent way to supply document
passwords to parsers
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209279#comment-13209279 ]
Nick Burch commented on TIKA-850:
---------------------------------
I've updated OfficeParser in r1244933 to use the same pattern as PDFParser, with PasswordProvider.
I believe these are the only two that currently support password protected files, so I think this is now finished. (Well, until we add more file formats!)
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
> Key: TIKA-850
> URL: https://issues.apache.org/jira/browse/TIKA-850
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.0
> Reporter: Nick Burch
> Fix For: 1.1
>
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (TIKA-850) Consistent way to supply document
passwords to parsers
Posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Burch resolved TIKA-850.
-----------------------------
Resolution: Fixed
Fix Version/s: 1.1
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
> Key: TIKA-850
> URL: https://issues.apache.org/jira/browse/TIKA-850
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.0
> Reporter: Nick Burch
> Fix For: 1.1
>
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-850) Consistent way to supply document
passwords to parsers
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193124#comment-13193124 ]
Nick Burch commented on TIKA-850:
---------------------------------
Currently, the objects set onto the ParseContext are:
* Detector.class
* DocumentSelector.class
* EmbeddedDocumentExtractor.class
* Locale.class
* MimeConfig.class
* Parser.class
The ones set onto the Metadata for use by parsers are:
* RESOURCE_NAME_KEY (resourceName)
* CONTENT_TYPE (Content-Type)
* PASSWORD (org.apache.pdfbox.tika.password) *PDF Only*
* TIKA_MIME_FILE (tika.mime.file);
* MIME_TYPE_MAGIC (mime.type.magic);
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
> Key: TIKA-850
> URL: https://issues.apache.org/jira/browse/TIKA-850
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.0
> Reporter: Nick Burch
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-850) Consistent way to supply document
passwords to parsers
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193131#comment-13193131 ]
Nick Burch commented on TIKA-850:
---------------------------------
Based on this, I think the best option may be to have a new interface, called something like PasswordProvider, set onto the ParseContext
PasswordProvider would have a single method, 'String getPassword(Metadata)', which would potentially allow you to look up the password based on the resource name and content type.
We'd probably want a single implementation out of the box, which takes a String on the constructor and always returns that as the password, to make life easy for calling parsing when you know the password for your file
Thoughts? Better names? Alternate ways to do it?
> Consistent way to supply document passwords to parsers
> ------------------------------------------------------
>
> Key: TIKA-850
> URL: https://issues.apache.org/jira/browse/TIKA-850
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.0
> Reporter: Nick Burch
>
> Currently, PDF document passwords are supplied to the parser via a special key on the Metadata object, while the Office Parser has a TODO and only supports the default password
> We should update all the parsers that support encrypted documents (currently PDF, Office OLE2 and Office OOXML) to receive the password in a consistent way
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira