You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/08/20 13:53:52 UTC

[jira] [Commented] (TIKA-1165) Autodetect and parse Asciidoc

    [ https://issues.apache.org/jira/browse/TIKA-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744887#comment-13744887 ] 

Nick Burch commented on TIKA-1165:
----------------------------------

Firstly, what mimetype would you expect for asciidoc?

Secondly, are there any open source Java Libraries for parsing asciidoc to something like xhtml, and/or to pull out the metadata?

Thirdly, what parts of the header are required, for if we wanted to try some magic detection?
                
> Autodetect and parse Asciidoc
> -----------------------------
>
>                 Key: TIKA-1165
>                 URL: https://issues.apache.org/jira/browse/TIKA-1165
>             Project: Tika
>          Issue Type: Wish
>          Components: languageidentifier, parser
>    Affects Versions: 1.4
>            Reporter: David Pilato
>            Priority: Trivial
>
> When parsing asciidoc metadata, we currently get the following:
> {noformat}
> Content-Encoding: ISO-8859-1
> Content-Length: 66363
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: asciidoc.adoc
> {noformat}
> Steps to reproduce:
> {code:title=asciidoc.sh|borderStyle=solid}
> curl https://raw.github.com/asciidoctor/asciidoctor.org/master/docs/asciidoc-syntax-quick-reference.adoc -O -s
> java -jar tika-app-1.4.jar -m asciidoc-syntax-quick-reference.adoc
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira