You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2018/01/02 17:52:00 UTC

[jira] [Commented] (TIKA-2462) Add a parser for sas7bdat

    [ https://issues.apache.org/jira/browse/TIKA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308429#comment-16308429 ] 

Nick Burch commented on TIKA-2462:
----------------------------------

While we wait for the re-license to go through, I've had a look at writing a parser. Outputting as CSV is very easy, as they've got a great class to do all the work. SAX events of a HTML table will be trickier, as the logic to format a raw value in a given column to "a string of how it looks in SAS" is currently in a private method. I've raised [#24|https://github.com/epam/parso/issues/24] to see if that can be refactored out, to avoid us needing to duplicate lots of their code

Tika questions on column metadata, test files etc still remain for us though!

> Add a parser for sas7bdat
> -------------------------
>
>                 Key: TIKA-2462
>                 URL: https://issues.apache.org/jira/browse/TIKA-2462
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>
> EPAM recently agreed to migrate to Apache 2.0 so that we can incorporate parso into Tika for sas7bdat files: https://github.com/epam/parso/issues/19 !!!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)