You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Cassandra Targett (JIRA)" <ji...@apache.org> on 2018/09/05 21:11:00 UTC

[jira] [Created] (SOLR-12746) Ref Guide HTML output should adhere to more standard HTML5

Cassandra Targett created SOLR-12746:
----------------------------------------

             Summary: Ref Guide HTML output should adhere to more standard HTML5
                 Key: SOLR-12746
                 URL: https://issues.apache.org/jira/browse/SOLR-12746
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: documentation
            Reporter: Cassandra Targett
            Assignee: Cassandra Targett


The default HTML produced by Jekyll/Asciidoctor adds a lot of extra {{<div>}} tags to the content which break up our content into very small chunks. This is acceptable to a casual website reader as far as it goes, but any Reader view in a browser or another type of content extraction system that uses a similar "readability" scoring algorithm is going to either miss a lot of content or fail to display the page entirely.

To see what I mean, take a page like https://lucene.apache.org/solr/guide/7_4/language-analysis.html and enable Reader View in your browser (I used Firefox; Steve Rowe told me offline Safari would not even offer the option on the page for him). You will notice a lot of missing content. It's almost like someone selected sentences at random.

Asciidoctor has a long-standing issue to provide a better more semantic-oriented HTML5 output, but it has not been resolved yet: https://github.com/asciidoctor/asciidoctor/issues/242

Asciidoctor does provide a way to override the default output templates by providing your own in Slim, HAML, ERB or any other template language supported by Tilt (none of which I know yet). There are some samples available via the Asciidoctor project which we can borrow, but it's otherwise unknown as of yet what parts of the output are causing the worst of the problems. This issue is to explore how to fix it to improve this part of the HTML reading experience.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org