You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cassandra Targett (JIRA)" <ji...@apache.org> on 2018/09/20 20:42:00 UTC

[jira] [Comment Edited] (SOLR-12746) Ref Guide HTML output should adhere to more standard HTML5

    [ https://issues.apache.org/jira/browse/SOLR-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622419#comment-16622419 ] 

Cassandra Targett edited comment on SOLR-12746 at 9/20/18 8:41 PM:
-------------------------------------------------------------------

There is now a branch for this work, that is getting close to being ready to merge: [https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=tree;h=refs/heads/jira/solr-12746;hb=refs/heads/jira/solr-12746]

Some info:
 # These changes require us to add a new {{_templates}} directory to direct Asciidoctor to use different selectors and classes when building the HTML. I started out with templates from [https://github.com/jirutka/asciidoctor-html5s], but modified them in many ways to change their classnames to the ones we were already using to simplify the process of fixing our CSS files.
 ** I have not yet dug into adding license info to Solr for use of these (or if I even need to since we aren't distributing the templates themselves), but the project uses the MIT license so it should be fine whatever we end up needing to do (TODO).
 # The Liquid templates used by Jekyll are still there, and have been modified to use {{<nav>}} and {{<article>}} tags instead of divs to identify the sections of the page that are content vs navigational elements.
 # I tried to simplify some of the layers of divs, but there's possibly more that could be done. For example, for a paragraph there used to be about 6 nested divs, like: {{column > post-content > main-content > sect1 > sectionbody > paragraph > p}}, but now it's closer to: {{column > content > sect1 > p}}.
 # I threw in some other CSS changes for stuff that has been bugging me - specifically the padding of 2nd level bullets in the in-page TOC, and changing the 2nd level bullets to use an open circle instead of "-".

Caveats:
 # The templates require that you have Slim installed locally in order to build the HTML. I've added instructions for this to {{solr-ref-guide/README.txt}} in the branch, but have not updated the Jenkins build script yet (TODO).
 # There is an error output by the Slim engine ({{Slim::Engine: Option :asciidoc is invalid}}) during the HTML build for every template (so, 30+ times). I suspect it's related to a part of our Jekyll config that we have to have. There is supposedly some way to declare to Slim that it should ignore this, but I haven't yet been able to figure it out yet. I also asked about it on the [Asciidoctor mailing list|http://discuss.asciidoctor.org/Slim-Engine-Option-asciidoc-invalid-with-custom-templates-and-Jekyll-td6477.html], but have not yet had a reply (TODO).


was (Author: ctargett):
There is now a branch for this work, that is getting close to being ready to merge: https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=tree;h=refs/heads/jira/solr-12746;hb=refs/heads/jira/solr-12746

Some info:

# These changes require us to add a new {{_templates}} directory to direct Asciidoctor to use different selectors and classes when building the HTML. I started out with templates from https://github.com/jirutka/asciidoctor-html5s, but modified them in many ways to change their classnames to the ones we were already using to simplify the process of fixing our CSS files. 
** I have not yet dug into adding license info to Solr for use of these (or if I even need to since we aren't distributing the templates themselves), but the project uses the MIT license so it should be fine whatever we end up needing to do (TODO).
# The Liquid templates used by Jekyll are still there, and have been modified to use {{<nav>}} and {{<article>}} tags instead of divs to identify the sections of the page that are content vs navigational elements.
# I tried to simplify some of the layers of divs, but there's possibly more that could be done. For example, for a paragraph there used to be about 6 nested divs, like: {{column > post-content > main-content > sect1 > sectionbody > paragraph > p}}, but now it's closer to: {{column > content > sect1 > p}}.
# I threw in some other CSS changes for stuff that has been bugging me - specifically the padding of 2nd level bullets in the in-page TOC, and changing the 2nd level bullets to use an open circle instead of "-".

Caveats:

# The templates require that you have Slim installed locally in order to build the HTML. I've added instructions for this to {{solr-ref-guide/README.txt}} in the branch, but have not updated the Jenkins build script yet (TODO).
# There is an error output by the Slim engine ({{Slim::Engine: Option :asciidoc is invalid}}) during the HTML build for every template (so, 30+ times). I suspect it's related to a part of our Jekyll config that we have to have. There is supposedly some way to declare to Slim that it should ignore this, but I haven't yet been able to figure it out yet. I also asked about it on the Asciidoctor mailing list, but have not yet had a reply (TODO).

> Ref Guide HTML output should adhere to more standard HTML5
> ----------------------------------------------------------
>
>                 Key: SOLR-12746
>                 URL: https://issues.apache.org/jira/browse/SOLR-12746
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: documentation
>            Reporter: Cassandra Targett
>            Assignee: Cassandra Targett
>            Priority: Major
>
> The default HTML produced by Jekyll/Asciidoctor adds a lot of extra {{<div>}} tags to the content which break up our content into very small chunks. This is acceptable to a casual website reader as far as it goes, but any Reader view in a browser or another type of content extraction system that uses a similar "readability" scoring algorithm is going to either miss a lot of content or fail to display the page entirely.
> To see what I mean, take a page like https://lucene.apache.org/solr/guide/7_4/language-analysis.html and enable Reader View in your browser (I used Firefox; Steve Rowe told me offline Safari would not even offer the option on the page for him). You will notice a lot of missing content. It's almost like someone selected sentences at random.
> Asciidoctor has a long-standing issue to provide a better more semantic-oriented HTML5 output, but it has not been resolved yet: https://github.com/asciidoctor/asciidoctor/issues/242
> Asciidoctor does provide a way to override the default output templates by providing your own in Slim, HAML, ERB or any other template language supported by Tilt (none of which I know yet). There are some samples available via the Asciidoctor project which we can borrow, but it's otherwise unknown as of yet what parts of the output are causing the worst of the problems. This issue is to explore how to fix it to improve this part of the HTML reading experience.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org