You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Jason E Bailey <je...@apache.org> on 2018/10/25 16:57:47 UTC

[whiteboard] introducing the tag modifier

Yeah, I'm really bad for naming bundles.

The new bundle currently provides a new "html5-generator" that will work with the existing rewriter.

How it works is that it uses the same rules that web browsers do to determine when a tag in a document is one that needs to be handled or if it's part of a text area. It then creates an Element object for that given section and passes it along when requested. This is a pull based parser with no structural validation. It won't re-write your html unless you specifically request it to.

An example generic usage:
Tag.stream(inputStream, "UTF-8").filter(elem -> elem.getType() == ElementType.START_TAG).count();

or a more complex one:

stream.map(element -> {
        if (element.containsAttribute("href")) {
            String value = element.getAttributeValue("href");
            if (value != null && value.startsWith("/")) {
                element.setAttribute("href", "http://www.apache.org" + value);
            }
        }
        if (element.containsAttribute("src")) {
            String value = element.getAttributeValue("src");
            if (value != null && value.startsWith("/")) {
                element.setAttribute("src", "http://www.apache.org" + value);
            }
        }
        return element;
 }).map(HtmlStreams.TO_HTML).forEach(System.out::print);

Which would parse all of your html, find hrefs and src attributes that are relational and rewrite them as full paths, then convert the individual nodes back to HTML.

- Jason


Re: [whiteboard] introducing the tag modifier

Posted by Jason E Bailey <je...@apache.org>.
With regard to the rewriter. 

In the simplest form you could have a several services that take a Stream<Element> and return a Stream<Element> so that it could build up a full process stream which is then used to parse the html.

The easiest implementation of that would take on a similar structure to the existing rewriter, without having the generator and processor as you mentioned.

However, what would probably be more beneficial is to have the processing of this html being done in asynchronous manner so that a a large document could be parsed, processed, and pushed without maintaining state. The state in this case being the full document.

There's a couple of ways that could be handled, but I'm still exploring. I'm diving into async contexts to see if that would be helpful. Which might lead to implementing Asynchorous Servlet support in a more structured format. That might be really useful on a large scale.


- Jason

On Thu, Oct 25, 2018, at 2:33 PM, Daniel Klco wrote:
> Jason,
> 
> This sounds like a great tool to create a new Rewriter. Would you see
> having OSGi Components as a subtype of Consumer being registered to provide
> the Transformers? Is there any reason to have a separate Generator and
> Processor?
> 
> Thanks,
> Dan
> 
> On Thu, Oct 25, 2018 at 12:57 PM Jason E Bailey <je...@apache.org> wrote:
> 
> > Yeah, I'm really bad for naming bundles.
> >
> > The new bundle currently provides a new "html5-generator" that will work
> > with the existing rewriter.
> >
> > How it works is that it uses the same rules that web browsers do to
> > determine when a tag in a document is one that needs to be handled or if
> > it's part of a text area. It then creates an Element object for that given
> > section and passes it along when requested. This is a pull based parser
> > with no structural validation. It won't re-write your html unless you
> > specifically request it to.
> >
> > An example generic usage:
> > Tag.stream(inputStream, "UTF-8").filter(elem -> elem.getType() ==
> > ElementType.START_TAG).count();
> >
> > or a more complex one:
> >
> > stream.map(element -> {
> >         if (element.containsAttribute("href")) {
> >             String value = element.getAttributeValue("href");
> >             if (value != null && value.startsWith("/")) {
> >                 element.setAttribute("href", "http://www.apache.org" +
> > value);
> >             }
> >         }
> >         if (element.containsAttribute("src")) {
> >             String value = element.getAttributeValue("src");
> >             if (value != null && value.startsWith("/")) {
> >                 element.setAttribute("src", "http://www.apache.org" +
> > value);
> >             }
> >         }
> >         return element;
> >  }).map(HtmlStreams.TO_HTML).forEach(System.out::print);
> >
> > Which would parse all of your html, find hrefs and src attributes that are
> > relational and rewrite them as full paths, then convert the individual
> > nodes back to HTML.
> >
> > - Jason
> >
> >

Re: [whiteboard] introducing the tag modifier

Posted by Daniel Klco <da...@gmail.com>.
Jason,

This sounds like a great tool to create a new Rewriter. Would you see
having OSGi Components as a subtype of Consumer being registered to provide
the Transformers? Is there any reason to have a separate Generator and
Processor?

Thanks,
Dan

On Thu, Oct 25, 2018 at 12:57 PM Jason E Bailey <je...@apache.org> wrote:

> Yeah, I'm really bad for naming bundles.
>
> The new bundle currently provides a new "html5-generator" that will work
> with the existing rewriter.
>
> How it works is that it uses the same rules that web browsers do to
> determine when a tag in a document is one that needs to be handled or if
> it's part of a text area. It then creates an Element object for that given
> section and passes it along when requested. This is a pull based parser
> with no structural validation. It won't re-write your html unless you
> specifically request it to.
>
> An example generic usage:
> Tag.stream(inputStream, "UTF-8").filter(elem -> elem.getType() ==
> ElementType.START_TAG).count();
>
> or a more complex one:
>
> stream.map(element -> {
>         if (element.containsAttribute("href")) {
>             String value = element.getAttributeValue("href");
>             if (value != null && value.startsWith("/")) {
>                 element.setAttribute("href", "http://www.apache.org" +
> value);
>             }
>         }
>         if (element.containsAttribute("src")) {
>             String value = element.getAttributeValue("src");
>             if (value != null && value.startsWith("/")) {
>                 element.setAttribute("src", "http://www.apache.org" +
> value);
>             }
>         }
>         return element;
>  }).map(HtmlStreams.TO_HTML).forEach(System.out::print);
>
> Which would parse all of your html, find hrefs and src attributes that are
> relational and rewrite them as full paths, then convert the individual
> nodes back to HTML.
>
> - Jason
>
>