You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/05 11:32:00 UTC

[jira] [Commented] (NUTCH-1945) Test for XLSX parser

    [ https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099801#comment-17099801 ] 

ASF GitHub Bot commented on NUTCH-1945:
---------------------------------------

sebastian-nagel opened a new pull request #525:
URL: https://github.com/apache/nutch/pull/525


   - add Tika unit test for XLSX files
   - bundle instance variables and utility methods in class TikaParserTest
   - clean up javadoc comments
   
   See patch attached to [NUTCH-1945](https://issues.apache.org/jira/browse/NUTCH-1945) which has been ported to apply to the current Nutch master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Test for XLSX parser
> --------------------
>
>                 Key: NUTCH-1945
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1945
>             Project: Nutch
>          Issue Type: Test
>          Components: parser
>    Affects Versions: 1.10, 2.3.1
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.17
>
>         Attachments: NUTCH-1945-2x.patch
>
>
> Add a test for Excel spreadsheets (xlsx) files: because the are formally also zip files (as well as other composite files) the MIME type detection is crucial also for parsing, cf. NUTCH-1605 and NUTCH-1925.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)