You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2018/01/02 17:52:00 UTC
[jira] [Commented] (TIKA-2462) Add a parser for sas7bdat
[ https://issues.apache.org/jira/browse/TIKA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308429#comment-16308429 ]
Nick Burch commented on TIKA-2462:
----------------------------------
While we wait for the re-license to go through, I've had a look at writing a parser. Outputting as CSV is very easy, as they've got a great class to do all the work. SAX events of a HTML table will be trickier, as the logic to format a raw value in a given column to "a string of how it looks in SAS" is currently in a private method. I've raised [#24|https://github.com/epam/parso/issues/24] to see if that can be refactored out, to avoid us needing to duplicate lots of their code
Tika questions on column metadata, test files etc still remain for us though!
> Add a parser for sas7bdat
> -------------------------
>
> Key: TIKA-2462
> URL: https://issues.apache.org/jira/browse/TIKA-2462
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
>
> EPAM recently agreed to migrate to Apache 2.0 so that we can incorporate parso into Tika for sas7bdat files: https://github.com/epam/parso/issues/19 !!!
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)