You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@maven.apache.org by GitBox <gi...@apache.org> on 2020/12/29 00:16:24 UTC

[GitHub] [maven-doxia] bertysentry opened a new pull request #49: [DOXIA-616]

bertysentry opened a new pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49


   Broad update to **doxia-module-markdown**
   
   * Properly exposed language class in fenced code blocks
   * Re-implemented metadata processing
   * Added unit tests
   * Added integration tests (with maven-site-plugin)
   * Fixed *XhtmlBaseParser* and *XhtmlBaseSink* to properly expose attributes of the PRE and CODE elements
   * Fixed *AbstractModuleTest* to use UTF-8 for input files
   * Bonus: Fixed DOXIA-542


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549660631



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior

Review comment:
       Good question.
   
   Flexmark's default behavior is [CommonMark 0.28](https://spec.commonmark.org/0.28/).
   
   Unit tests weren't covering edge cases like Pegdown vs CommonMark, so I was concerned to break people's documentation and I kept the Pegdown profile. It's really a question of balance between risk of breaking existing stuff, and moving forward with something cleaner and properly specified.
   
   I'm personally in favor of moving forward with CommonMark (just like Web developers are happy that we finally got rid of Microsoft Internet Explorer... :-D)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] asfgit closed pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

asfgit closed pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549781997



##########
File path: doxia-core/src/main/java/org/apache/maven/doxia/sink/impl/Xhtml5BaseSink.java
##########
@@ -1905,7 +1905,9 @@ private void inlineSemantics( SinkEventAttributes attributes, String semantic,
     {
         if ( attributes.containsAttribute( SinkEventAttributes.SEMANTICS, semantic ) )
         {
-            writeStartTag( tag );
+            SinkEventAttributes attributesNoSemantics = ( SinkEventAttributes ) attributes.copyAttributes();
+            attributesNoSemantics.removeAttribute( SinkEventAttributes.SEMANTICS );

Review comment:
       Why is this necessary?

##########
File path: doxia-modules/doxia-module-markdown/pom.xml
##########
@@ -72,5 +72,80 @@ under the License.
       <groupId>org.codehaus.plexus</groupId>
       <artifactId>plexus-utils</artifactId>
     </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
   </dependencies>
+  <build>
+
+    <plugins>
+
+      <!-- install -->
+      <plugin>
+        <artifactId>maven-install-plugin</artifactId>

Review comment:
       Aren't they available when `verify` run because this is run after `install`?

##########
File path: doxia-core/src/main/java/org/apache/maven/doxia/parser/XhtmlBaseParser.java
##########
@@ -509,7 +509,8 @@ else if ( ( parser.getName().equals( HtmlMarkup.CODE.toString() ) )
                 || ( parser.getName().equals( HtmlMarkup.SAMP.toString() ) )
                 || ( parser.getName().equals( HtmlMarkup.TT.toString() ) ) )
         {
-            sink.inline( SinkEventAttributeSet.Semantics.CODE );
+            attribs.addAttributes( SinkEventAttributeSet.Semantics.CODE );

Review comment:
       Interesting, is this a semantical change?

##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior
+        flexmarkOptions.setFrom( ParserEmulationProfile.PEGDOWN );
+
+        // Enable the extensions that we used to have in Pegdown
+        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, Arrays.asList(
+                EscapedCharacterExtension.create(),
+                AbbreviationExtension.create(),
+                AutolinkExtension.create(),
+                DefinitionExtension.create(),
+                TypographicExtension.create(),
+                TablesExtension.create(),
+                WikiLinkExtension.create(),
+                StrikethroughExtension.create()
+        ) );
+
+        // Disable wrong apostrophe replacement
+        flexmarkOptions.set( TypographicExtension.SINGLE_QUOTE_UNMATCHED, "&apos;" );

Review comment:
       Does this address DOXIA-542?

##########
File path: doxia-modules/doxia-module-markdown/src/it/settings.xml
##########
@@ -0,0 +1,55 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<settings>
+  <profiles>
+    <profile>
+      <id>it-repo</id>
+      <activation>
+        <activeByDefault>true</activeByDefault>
+      </activation>
+      <repositories>
+        <repository>
+          <id>1local.central</id>

Review comment:
       Is the preceding `1` intentional?

##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +191,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key, true ) );

Review comment:
       Is this is a superset of `escapeXml`?

##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +191,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key, true ) );
+                    html.append( "' content='" );
+                    html.append( HtmlTools.escapeHTML( value, true ) );
+                    html.append( "' />" );
                 }
             }
-            if ( !first )
-            {
-                text = text.substring( metadataMatcher.end() );
-            }
+
+            // Trim the metadata from the source
+            text = text.substring( metadataMatcher.end( 0 ) );
+
         }
 
-        Node rootNode = parser.parse( text );
-        String markdownHtml = renderer.render( rootNode );
+        // Now is the time to parse the Markdown document
+        // (after we've trimmed out the metadatas, and before we check for its headings)
+        Node documentRoot = FLEXMARK_PARSER.parse( text );
 
-        if ( !haveTitle && rootNode.hasChildren() )
+        // Special trick: if there is no title specified as a metadata in the header, we will use the first
+        // heading as the document title
+        if ( !haveTitle && documentRoot.hasChildren() )
         {
-            // use the first (non-comment) node only if it is a heading
-            Node firstNode = rootNode.getFirstChild();
-            while ( firstNode != null && !( firstNode instanceof Heading ) )
+            // Skip the comment nodes
+            Node firstNode = documentRoot.getFirstChild();
+            while ( firstNode != null && firstNode instanceof HtmlCommentBlock )

Review comment:
       Thid does not retain HTML style commments?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550327590



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key ) );
+                    html.append( "' content='" );
+                    html.append( HtmlTools.escapeHTML( value ) );
+                    html.append( "' />" );
                 }
             }
-            if ( !first )
-            {
-                text = text.substring( metadataMatcher.end() );
-            }
+
+            // Trim the metadata from the source
+            text = text.substring( metadataMatcher.end( 0 ) );
+
         }
 
-        Node rootNode = parser.parse( text );
-        String markdownHtml = renderer.render( rootNode );
+        // Now is the time to parse the Markdown document
+        // (after we've trimmed out the metadatas, and before we check for its headings)
+        Node documentRoot = FLEXMARK_PARSER.parse( text );
 
-        if ( !haveTitle && rootNode.hasChildren() )
+        // Special trick: if there is no title specified as a metadata in the header, we will use the first
+        // heading as the document title
+        if ( !haveTitle && documentRoot.hasChildren() )
         {
-            // use the first (non-comment) node only if it is a heading
-            Node firstNode = rootNode.getFirstChild();
-            while ( firstNode != null && !( firstNode instanceof Heading ) )
+            // Skip the comment nodes
+            Node firstNode = documentRoot.getFirstChild();
+            while ( firstNode != null && firstNode instanceof HtmlCommentBlock )
             {
-                if ( !( firstNode instanceof HtmlCommentBlock ) )
-                {
-                    break;
-                }
                 firstNode = firstNode.getNext();
             }
 
-            if ( firstNode instanceof Heading )
+            // If this first non-comment node is a heading, we use it as the document title
+            if ( firstNode != null && firstNode instanceof Heading )
             {
                 html.append( "<title>" );
                 TextCollectingVisitor collectingVisitor = new TextCollectingVisitor();
                 String headingText = collectingVisitor.collectAndGetText( firstNode );
-                html.append( StringEscapeUtils.escapeXml( headingText ) );
+                html.append( HtmlTools.escapeHTML( headingText, false ) );

Review comment:
       Same here...

##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );

Review comment:
       Why is here a `false` and not just one arg like the rest of the calls?

##########
File path: doxia-modules/doxia-module-markdown/src/test/resources/fenced-code-block.md
##########
@@ -0,0 +1,5 @@
+Below code is Java:
+
+```java
+System.out.println(helloWorld);

Review comment:
       Quotes around literal missing




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549795508



##########
File path: doxia-core/src/main/java/org/apache/maven/doxia/sink/impl/Xhtml5BaseSink.java
##########
@@ -1905,7 +1905,9 @@ private void inlineSemantics( SinkEventAttributes attributes, String semantic,
     {
         if ( attributes.containsAttribute( SinkEventAttributes.SEMANTICS, semantic ) )
         {
-            writeStartTag( tag );
+            SinkEventAttributes attributesNoSemantics = ( SinkEventAttributes ) attributes.copyAttributes();
+            attributesNoSemantics.removeAttribute( SinkEventAttributes.SEMANTICS );

Review comment:
       *inline* Sink events may have attributes that we want to leverage (notably the *Class* attribute).
   
   Example: *inline* Sink event with `Semantics = "code"` and `Class = "language-java"` will be output as `<code class="language-java">`.
   
   Note: We don't want to output `<code semantics="code" class="language-java">` and that's why we need to remove the *SEMANTICS* attribute.
   
   Admittedly, the *inlineSemantics()* function is not the most elegant piece of code but I didn't want to reshuffle the XHTML Sink too much, as it's used in many different places.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549803784



##########
File path: doxia-modules/doxia-module-markdown/pom.xml
##########
@@ -72,5 +72,80 @@ under the License.
       <groupId>org.codehaus.plexus</groupId>
       <artifactId>plexus-utils</artifactId>
     </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
   </dependencies>
+  <build>
+
+    <plugins>
+
+      <!-- install -->
+      <plugin>
+        <artifactId>maven-install-plugin</artifactId>

Review comment:
       hm... I need to think about this. It is somewhat hacky, but I do understand your intentions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550819334



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );

Review comment:
       Happy New Year! 😉
   
   I understand your concern, but I think that the parameter names are misleading.
   
   More importantly, the *XhtmlBaseSink* **itself** uses `HtmlTools.escapeHTML(text, false)`:
   
   * For example, all XHTML content is inserted with [XhtmlBaseSink.content()](https://github.com/apache/maven-doxia/blob/74c3e876b5c7a044f5a6b7ad7be6ea3ac10a6f49/doxia-core/src/main/java/org/apache/maven/doxia/sink/impl/XhtmlBaseSink.java#L2176)
   * *XhtmlBaseSink.content()* calls [XhtmlBaseSink.escapeHTML()](https://github.com/apache/maven-doxia/blob/74c3e876b5c7a044f5a6b7ad7be6ea3ac10a6f49/doxia-core/src/main/java/org/apache/maven/doxia/sink/impl/XhtmlBaseSink.java#L2201)
   
   and *XhtmlBaseSink.escapeHTML()* is simply:
   
   ```java
       /**
        * Forward to HtmlTools.escapeHTML( text ).
        *
        * @param text the String to escape, may be null
        * @return the text escaped, "" if null String input
        * @see org.apache.maven.doxia.util.HtmlTools#escapeHTML(String)
        */
       protected static String escapeHTML( String text )
       {
           return HtmlTools.escapeHTML( text, false );
       }
   ```
   
   So, calling `HtmlTools.escapeHTML( text, false )` is really the only proper way to produce XHTML content that will be parsed with the XhtmlSink parser. And it is consistent with Doxia's own XhtmlSink.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549825320



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior

Review comment:
       Can we move to CommonMark instead of sticking to ages?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550802165



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );

Review comment:
       I am not really happy with that because we have an XHTML (HTML) sink and `false` means that it uses HTML entities which XML does not have.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] slachiewicz commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

slachiewicz commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549638391



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior

Review comment:
       What is yours opinion about using this emulation? Should we stay with this or switch to some different profile?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-752708690


   Guys, I updated the PR with all suggested changes. Is there anything else I can do? I would like to have this merged so I can submit my PR on DOXIA-542 (apostrophes vs quotes). Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550357588



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );

Review comment:
       The only difference between *false* and *true* in *HtmlTools.escapeHTML()* is the handling of the apostrophe (should it be replaced with `&apos;` or not). As we're inside a `<title>` element, there is no need to replace the apostrophe.
   
   However, on line 218 and 221, we're inside the attribute of an element, i.e. inside quotes, and that's why we need to escape the apostrophe here.
   
   Note that the presence (and absence) of apostrophe in the title and metadatas is covered with unit tests (see *testMetadataSinkEvent()* in **MarkdownParserTest.java**)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] slachiewicz commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

slachiewicz commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549680295



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior

Review comment:
       Let's try after this PR merge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550878356



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );

Review comment:
       I hear you! I guess *escapeHTML()* should be renamed *escapeXHTML()*, as it's what it actually does. And the *xml* param should really be just about apostrophes. Anyway, that's how it is now...
   
   Thank you for the review, @michael-o !




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549813424



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +191,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key, true ) );

Review comment:
       *escapeXml()* is deprecated (as well as the entire *StringEscapeUtils* class), that's why I replaced it with its equivalent in this very project. It's supposed to do exactly the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-753544218


   Thank you @michael-o , I will look into MSITE tomorrow and submit the PR for DOXIA-542.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549872226



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior

Review comment:
       I'm removing the Pegdown emulation profile option, but I will keep all of the extensions that used to be enabled. We don't want to lose features.
   
   Also, I'm thinking of enabling the [YAML Extension](https://github.com/vsch/flexmark-java/wiki/Extensions#yaml-front-matter), which is very similar to what we're doing to process metadata, but that will be for another issue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-753470356


   Tested, when I run the general IT one site is missing:
   ```
   192.168.1.52 - - [02/Jan/2021 13:52:48] "GET /general/target/site/quotes.html HTTP/1.1" 404 -
   ```
   ```
   mosipov@bsd1srv:/usr/home/mosipov/var/Projekte/maven-doxia/doxia-modules/doxia-module-markdown/target/it/general/target/site (DOXIA-616 *)
   $ tree
   .
   ├── css
   │   ├── maven-base.css
   │   ├── maven-theme.css
   │   ├── print.css
   │   └── site.css
   ├── images
   │   ├── collapsed.gif
   │   ├── expanded.gif
   │   ├── external.png
   │   ├── icon_error_sml.gif
   │   ├── icon_info_sml.gif
   │   ├── icon_success_sml.gif
   │   ├── icon_warning_sml.gif
   │   ├── logos
   │   │   ├── build-by-maven-black.png
   │   │   ├── build-by-maven-white.png
   │   │   └── maven-feather.png
   │   └── newwindow.png
   ├── index.html
   └── metadata.html
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550358090



##########
File path: doxia-modules/doxia-module-markdown/src/test/resources/fenced-code-block.md
##########
@@ -0,0 +1,5 @@
+Below code is Java:
+
+```java
+System.out.println(helloWorld);

Review comment:
       Nah, it's on purpose. This is just a test resource that is translated as Sink events. But the presence of quotes around `helloWorld` breaks the *text* Sink event in several fragment, which is annoying for my test.
   
   But we can consider *helloWorld* is a *String* variable defined somewhere else.
   
   Again, this is just an example for the unit tests. We don't need valid Java code here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549817492



##########
File path: doxia-modules/doxia-module-markdown/pom.xml
##########
@@ -72,5 +72,80 @@ under the License.
       <groupId>org.codehaus.plexus</groupId>
       <artifactId>plexus-utils</artifactId>
     </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
   </dependencies>
+  <build>
+
+    <plugins>
+
+      <!-- install -->
+      <plugin>
+        <artifactId>maven-install-plugin</artifactId>

Review comment:
       It is hacky, I agree, especially as it specifies other artifacts by their path relative to this module. At the same time, it's really the purpose of **maven-install-plugin**: install artifacts in a Maven repository, which is what we're doing here so that **maven-invoker-plugin** has all it needs to run integration tests.
   
   I'm open to alternatives (as that would be useful for other multi-module projects!).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550357704



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key ) );
+                    html.append( "' content='" );
+                    html.append( HtmlTools.escapeHTML( value ) );
+                    html.append( "' />" );
                 }
             }
-            if ( !first )
-            {
-                text = text.substring( metadataMatcher.end() );
-            }
+
+            // Trim the metadata from the source
+            text = text.substring( metadataMatcher.end( 0 ) );
+
         }
 
-        Node rootNode = parser.parse( text );
-        String markdownHtml = renderer.render( rootNode );
+        // Now is the time to parse the Markdown document
+        // (after we've trimmed out the metadatas, and before we check for its headings)
+        Node documentRoot = FLEXMARK_PARSER.parse( text );
 
-        if ( !haveTitle && rootNode.hasChildren() )
+        // Special trick: if there is no title specified as a metadata in the header, we will use the first
+        // heading as the document title
+        if ( !haveTitle && documentRoot.hasChildren() )
         {
-            // use the first (non-comment) node only if it is a heading
-            Node firstNode = rootNode.getFirstChild();
-            while ( firstNode != null && !( firstNode instanceof Heading ) )
+            // Skip the comment nodes
+            Node firstNode = documentRoot.getFirstChild();
+            while ( firstNode != null && firstNode instanceof HtmlCommentBlock )
             {
-                if ( !( firstNode instanceof HtmlCommentBlock ) )
-                {
-                    break;
-                }
                 firstNode = firstNode.getNext();
             }
 
-            if ( firstNode instanceof Heading )
+            // If this first non-comment node is a heading, we use it as the document title
+            if ( firstNode != null && firstNode instanceof Heading )
             {
                 html.append( "<title>" );
                 TextCollectingVisitor collectingVisitor = new TextCollectingVisitor();
                 String headingText = collectingVisitor.collectAndGetText( firstNode );
-                html.append( StringEscapeUtils.escapeXml( headingText ) );
+                html.append( HtmlTools.escapeHTML( headingText, false ) );

Review comment:
       Same answer as above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549814887



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior
+        flexmarkOptions.setFrom( ParserEmulationProfile.PEGDOWN );
+
+        // Enable the extensions that we used to have in Pegdown
+        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, Arrays.asList(
+                EscapedCharacterExtension.create(),
+                AbbreviationExtension.create(),
+                AutolinkExtension.create(),
+                DefinitionExtension.create(),
+                TypographicExtension.create(),
+                TablesExtension.create(),
+                WikiLinkExtension.create(),
+                StrikethroughExtension.create()
+        ) );
+
+        // Disable wrong apostrophe replacement
+        flexmarkOptions.set( TypographicExtension.SINGLE_QUOTE_UNMATCHED, "&apos;" );

Review comment:
       I'd prefer this to be separate for documentation reasons.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-752758230


   OK, let me go through again and test it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549525703



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -24,27 +24,33 @@
 import com.vladsch.flexmark.util.ast.Node;
 import com.vladsch.flexmark.ast.util.TextCollectingVisitor;
 import com.vladsch.flexmark.html.HtmlRenderer;
-import com.vladsch.flexmark.profiles.pegdown.Extensions;
-import com.vladsch.flexmark.profiles.pegdown.PegdownOptionsAdapter;
-import com.vladsch.flexmark.util.builder.Extension;
-import com.vladsch.flexmark.util.options.MutableDataHolder;
-import org.apache.commons.lang3.StringEscapeUtils;
-import org.apache.commons.lang3.StringUtils;
+import com.vladsch.flexmark.parser.ParserEmulationProfile;
+import com.vladsch.flexmark.util.options.MutableDataSet;
+import com.vladsch.flexmark.ext.escaped.character.EscapedCharacterExtension;
+import com.vladsch.flexmark.ext.abbreviation.AbbreviationExtension;
+import com.vladsch.flexmark.ext.autolink.AutolinkExtension;
+import com.vladsch.flexmark.ext.definition.DefinitionExtension;
+import com.vladsch.flexmark.ext.typographic.TypographicExtension;
+import com.vladsch.flexmark.ext.tables.TablesExtension;
+import com.vladsch.flexmark.ext.wikilink.WikiLinkExtension;
+import com.vladsch.flexmark.ext.gfm.strikethrough.StrikethroughExtension;
+
+import org.apache.commons.io.input.CharSequenceReader;

Review comment:
       I took the liberty to include Commons IO to convert a *StringBuilder* to a *Reader* without going through a *String*. Let me know if that is overkill and you prefer to not to use Commons IO. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-753464394


   I don't have any objection anymore. Let me run the tests and then I will merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549815838



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior
+        flexmarkOptions.setFrom( ParserEmulationProfile.PEGDOWN );
+
+        // Enable the extensions that we used to have in Pegdown
+        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, Arrays.asList(
+                EscapedCharacterExtension.create(),
+                AbbreviationExtension.create(),
+                AutolinkExtension.create(),
+                DefinitionExtension.create(),
+                TypographicExtension.create(),
+                TablesExtension.create(),
+                WikiLinkExtension.create(),
+                StrikethroughExtension.create()
+        ) );
+
+        // Disable wrong apostrophe replacement
+        flexmarkOptions.set( TypographicExtension.SINGLE_QUOTE_UNMATCHED, "&apos;" );

Review comment:
       Let's finish this first and then put 542 on top.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-753532426


   The changeset seems to be fine. There is now expected fallout in MSITE:
   ```
   [INFO] Building: doxia-formats/pom.xml
   [INFO] run post-build script verify.groovy
   [INFO]   The post-build script did not succeed. assert content.contains( '<div class="source"><pre class="prettyprint linenums">code block' )
          |       |
          |       false
          <!DOCTYPE html>
          <!--
           | Generated by Apache Maven Doxia Site Renderer 1.9.2 from src/site/markdown/markdown.md at 2021-01-02
           | Rendered using Apache Maven Fluido Skin 1.8
          -->
          <html xmlns="http://www.w3.org/1999/xhtml" lang="en">
            <head>
              <meta charset="UTF-8" />
              <meta name="viewport" content="width=device-width, initial-scale=1" />
              <meta name="generator" content="Apache Maven Doxia Site Renderer 1.9.2" />
              <title>Doxia formats tests &#x2013; Markdown Format works</title>
              <link rel="stylesheet" href="./css/apache-maven-fluido-1.8.min.css" />
              <link rel="stylesheet" href="./css/site.css" />
              <link rel="stylesheet" href="./css/print.css" media="print" />
              <script src="./js/apache-maven-fluido-1.8.min.js"></script>
            </head>
            <body class="topBarDisabled">
              <div class="container-fluid">
                <header>
                  <div id="banner">
                    <div class="pull-left"><div id="bannerLeft"><h2>Doxia formats tests</h2>
          </div>
          </div>
                    <div class="pull-right"></div>
                    <div class="clear"><hr/></div>
                  </div>
   
                  <div id="breadcrumbs">
                    <ul class="breadcrumb">
                  <li id="publishDate">Last Published: 2021-01-02<span class="divider">|</span>
          </li>
                    <li id="projectVersion">Version: 1.0-SNAPSHOT</li>
                    </ul>
                  </div>
                </header>
                <div class="row-fluid">
                  <header id="leftColumn" class="span2">
                    <nav class="well sidebar-nav">
            <ul class="nav nav-list">
            </ul>
                    </nav>
                    <div class="well sidebar-nav">
                      <hr />
                      <div id="poweredBy">
                        <div class="clear"></div>
                        <div class="clear"></div>
                        <div class="clear"></div>
          <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /></a>
                      </div>
                    </div>
                  </header>
                  <main id="bodyColumn"  class="span10" >
          <!---
          Licensed to the Apache Software Foundation (ASF) under one
          or more contributor license agreements.  See the NOTICE file
          distributed with this work for additional information
          regarding copyright ownership.  The ASF licenses this file
          to you under the Apache License, Version 2.0 (the
          "License"); you may not use this file except in compliance
          with the License.  You may obtain a copy of the License at
   
            http://www.apache.org/licenses/LICENSE-2.0
   
          Unless required by applicable law or agreed to in writing,
          software distributed under the License is distributed on an
          "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
          KIND, either express or implied.  See the License for the
          specific language governing permissions and limitations
          under the License.
          -->
          <section>
          <h2><a name="Markdown_Format_works"></a>Markdown Format works</h2>
          <p>But &#x2018;quotes&#x2019; and &#x201c;double quotes&#x201d; were stripped from HTML result with DOXIA 1.3:
          see <a class="externalLink" href="https://issues.apache.org/jira/browse/DOXIA-473">DOXIA-473</a>.</p>
          <ul>
          <li><a href="#Markdown_Format_works">Markdown Format works</a>
          <ul>
          <li><a href="#Subsection">Subsection</a></li></ul></li></ul>
          <section>
          <h3><a name="Subsection"></a>Subsection</h3>
          <p>Missing <code>monospaced</code> support <a class="externalLink" href="https://issues.apache.org/jira/browse/DOXIA-597">DOXIA-597</a></p>
          <p>Missing code block color <a class="externalLink" href="https://issues.apache.org/jira/browse/DOXIA-571">DOXIA-571</a>:</p>
   
          <div class="source"><pre class="prettyprint linenums"><code>code block
          Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
          Maven home: D:\apache-maven-3.6.3\apache-maven\bin\..
          Java version: 1.8.0_232, vendor: AdoptOpenJDK, runtime: C:\Program Files\AdoptOpenJDK\jdk-8.0.232.09-hotspot\jre
          Default locale: en_US, platform encoding: Cp1250
          OS name: &quot;windows 10&quot;, version: &quot;10.0&quot;, arch: &quot;amd64&quot;, family: &quot;windows&quot;
          </code></pre></div></section></section>
                  </main>
                </div>
              </div>
              <hr/>
              <footer>
                <div class="container-fluid">
                  <div class="row-fluid">
                      <p>Copyright &#169;      2021..</p>
                  </div>
                </div>
              </footer>
            </body>
          </html>
   [INFO]           doxia-formats/pom.xml ............................ FAILED (5.3 s)
   ```
   
   Can you have a look and create a followup PR in MSITE?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-752216466


   @michael-o I removed code in reference to DOXIA-542 (quote vs apostrophe) and associated tests. I removed the `true` flag in *HtmlTools.escapeHTML()* calls as suggested. Please review the updated code. Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549763572



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -24,27 +24,33 @@
 import com.vladsch.flexmark.util.ast.Node;
 import com.vladsch.flexmark.ast.util.TextCollectingVisitor;
 import com.vladsch.flexmark.html.HtmlRenderer;
-import com.vladsch.flexmark.profiles.pegdown.Extensions;
-import com.vladsch.flexmark.profiles.pegdown.PegdownOptionsAdapter;
-import com.vladsch.flexmark.util.builder.Extension;
-import com.vladsch.flexmark.util.options.MutableDataHolder;
-import org.apache.commons.lang3.StringEscapeUtils;
-import org.apache.commons.lang3.StringUtils;
+import com.vladsch.flexmark.parser.ParserEmulationProfile;
+import com.vladsch.flexmark.util.options.MutableDataSet;
+import com.vladsch.flexmark.ext.escaped.character.EscapedCharacterExtension;
+import com.vladsch.flexmark.ext.abbreviation.AbbreviationExtension;
+import com.vladsch.flexmark.ext.autolink.AutolinkExtension;
+import com.vladsch.flexmark.ext.definition.DefinitionExtension;
+import com.vladsch.flexmark.ext.typographic.TypographicExtension;
+import com.vladsch.flexmark.ext.tables.TablesExtension;
+import com.vladsch.flexmark.ext.wikilink.WikiLinkExtension;
+import com.vladsch.flexmark.ext.gfm.strikethrough.StrikethroughExtension;
+
+import org.apache.commons.io.input.CharSequenceReader;

Review comment:
       I am fine with that. WE love Apache dogfood.

##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -24,27 +24,33 @@
 import com.vladsch.flexmark.util.ast.Node;
 import com.vladsch.flexmark.ast.util.TextCollectingVisitor;
 import com.vladsch.flexmark.html.HtmlRenderer;
-import com.vladsch.flexmark.profiles.pegdown.Extensions;
-import com.vladsch.flexmark.profiles.pegdown.PegdownOptionsAdapter;
-import com.vladsch.flexmark.util.builder.Extension;
-import com.vladsch.flexmark.util.options.MutableDataHolder;
-import org.apache.commons.lang3.StringEscapeUtils;
-import org.apache.commons.lang3.StringUtils;
+import com.vladsch.flexmark.parser.ParserEmulationProfile;
+import com.vladsch.flexmark.util.options.MutableDataSet;
+import com.vladsch.flexmark.ext.escaped.character.EscapedCharacterExtension;
+import com.vladsch.flexmark.ext.abbreviation.AbbreviationExtension;
+import com.vladsch.flexmark.ext.autolink.AutolinkExtension;
+import com.vladsch.flexmark.ext.definition.DefinitionExtension;
+import com.vladsch.flexmark.ext.typographic.TypographicExtension;
+import com.vladsch.flexmark.ext.tables.TablesExtension;
+import com.vladsch.flexmark.ext.wikilink.WikiLinkExtension;
+import com.vladsch.flexmark.ext.gfm.strikethrough.StrikethroughExtension;
+
+import org.apache.commons.io.input.CharSequenceReader;

Review comment:
       I am fine with that. We love Apache dogfood.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549815437



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +191,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key, true ) );

Review comment:
       I understand that, but the Sink says XHTML, so we need at least XML escapes. Just checked the code, all is fine. It runs in XML mode. You can drop the `true`. This is the default value.

##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +191,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key, true ) );
+                    html.append( "' content='" );
+                    html.append( HtmlTools.escapeHTML( value, true ) );
+                    html.append( "' />" );
                 }
             }
-            if ( !first )
-            {
-                text = text.substring( metadataMatcher.end() );
-            }
+
+            // Trim the metadata from the source
+            text = text.substring( metadataMatcher.end( 0 ) );
+
         }
 
-        Node rootNode = parser.parse( text );
-        String markdownHtml = renderer.render( rootNode );
+        // Now is the time to parse the Markdown document
+        // (after we've trimmed out the metadatas, and before we check for its headings)
+        Node documentRoot = FLEXMARK_PARSER.parse( text );
 
-        if ( !haveTitle && rootNode.hasChildren() )
+        // Special trick: if there is no title specified as a metadata in the header, we will use the first
+        // heading as the document title
+        if ( !haveTitle && documentRoot.hasChildren() )
         {
-            // use the first (non-comment) node only if it is a heading
-            Node firstNode = rootNode.getFirstChild();
-            while ( firstNode != null && !( firstNode instanceof Heading ) )
+            // Skip the comment nodes
+            Node firstNode = documentRoot.getFirstChild();
+            while ( firstNode != null && firstNode instanceof HtmlCommentBlock )

Review comment:
       OK.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549814339



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +191,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );
                     html.append( "</title>" );
                 }
-                else if ( "author".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'author\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
-                else if ( "date".equalsIgnoreCase( key ) )
-                {
-                    html.append( "<meta name=\'date\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
-                }
                 else
                 {
-                    html.append( "<meta name=\'" );
-                    html.append( StringEscapeUtils.escapeXml( key ) );
-                    html.append( "\' content=\'" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
-                    html.append( "\' />" );
+                    html.append( "<meta name='" );
+                    html.append( HtmlTools.escapeHTML( key, true ) );
+                    html.append( "' content='" );
+                    html.append( HtmlTools.escapeHTML( value, true ) );
+                    html.append( "' />" );
                 }
             }
-            if ( !first )
-            {
-                text = text.substring( metadataMatcher.end() );
-            }
+
+            // Trim the metadata from the source
+            text = text.substring( metadataMatcher.end( 0 ) );
+
         }
 
-        Node rootNode = parser.parse( text );
-        String markdownHtml = renderer.render( rootNode );
+        // Now is the time to parse the Markdown document
+        // (after we've trimmed out the metadatas, and before we check for its headings)
+        Node documentRoot = FLEXMARK_PARSER.parse( text );
 
-        if ( !haveTitle && rootNode.hasChildren() )
+        // Special trick: if there is no title specified as a metadata in the header, we will use the first
+        // heading as the document title
+        if ( !haveTitle && documentRoot.hasChildren() )
         {
-            // use the first (non-comment) node only if it is a heading
-            Node firstNode = rootNode.getFirstChild();
-            while ( firstNode != null && !( firstNode instanceof Heading ) )
+            // Skip the comment nodes
+            Node firstNode = documentRoot.getFirstChild();
+            while ( firstNode != null && firstNode instanceof HtmlCommentBlock )

Review comment:
       This section is about retrieving the a potential title in the document. If the first meaningful (non-comment) node is a *Heading*, then it is considered as the title. Here, we're skipping comment nodes to check the first non-comment node.
   
   Comment nodes are preserved in the document that is produced.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] slachiewicz commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

slachiewicz commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-752052491


   I'm ok with this changes to be merged. We can test it with our maven website and compare differences.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-753532760


   Please also proceed with your other PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549812852



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior
+        flexmarkOptions.setFrom( ParserEmulationProfile.PEGDOWN );
+
+        // Enable the extensions that we used to have in Pegdown
+        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, Arrays.asList(
+                EscapedCharacterExtension.create(),
+                AbbreviationExtension.create(),
+                AutolinkExtension.create(),
+                DefinitionExtension.create(),
+                TypographicExtension.create(),
+                TablesExtension.create(),
+                WikiLinkExtension.create(),
+                StrikethroughExtension.create()
+        ) );
+
+        // Disable wrong apostrophe replacement
+        flexmarkOptions.set( TypographicExtension.SINGLE_QUOTE_UNMATCHED, "&apos;" );

Review comment:
       Indeed, it does! :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-752034897


   @michael-o:
   > Massive...will take a day or two
   
    Yeah, sorry for the massive PR. One thing lead to another...
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549861851



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior

Review comment:
       Okay, you're right! Guys, fasten your seatbelts, let's do this! :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549810335



##########
File path: doxia-modules/doxia-module-markdown/src/it/settings.xml
##########
@@ -0,0 +1,55 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<settings>
+  <profiles>
+    <profile>
+      <id>it-repo</id>
+      <activation>
+        <activeByDefault>true</activeByDefault>
+      </activation>
+      <repositories>
+        <repository>
+          <id>1local.central</id>

Review comment:
       Ha! Good question. To be honest, I copy-pasted this **maven-invoker-plugin** configuration from another project. I remember struggling with the order of repositories and someone suggested prefixing the repository with "1" so that it comes before "all". But I'm not quite sure...
   
   If you recommend to remove "1", let me know.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549819570



##########
File path: doxia-modules/doxia-module-markdown/pom.xml
##########
@@ -72,5 +72,80 @@ under the License.
       <groupId>org.codehaus.plexus</groupId>
       <artifactId>plexus-utils</artifactId>
     </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
   </dependencies>
+  <build>
+
+    <plugins>
+
+      <!-- install -->
+      <plugin>
+        <artifactId>maven-install-plugin</artifactId>

Review comment:
       @rfscholte @khmarbaise Do you have an opinion here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549793311



##########
File path: doxia-core/src/main/java/org/apache/maven/doxia/parser/XhtmlBaseParser.java
##########
@@ -509,7 +509,8 @@ else if ( ( parser.getName().equals( HtmlMarkup.CODE.toString() ) )
                 || ( parser.getName().equals( HtmlMarkup.SAMP.toString() ) )
                 || ( parser.getName().equals( HtmlMarkup.TT.toString() ) ) )
         {
-            sink.inline( SinkEventAttributeSet.Semantics.CODE );
+            attribs.addAttributes( SinkEventAttributeSet.Semantics.CODE );

Review comment:
       Before this change, an *inline* event was added with the *CODE* semantic, and getting rid of all potential attributes of the element being parsed.
   
   After this change, we add the *inline* event, with the *CODE* semantic **in addition** to the attributes of the element being parsed.
   
   So, `<code class="language-java">` becomes an *inline* event, with `Semantics = "code"` and `Class = "language-java"`.
   
   Note this reflects what was already done in **Xhtml5BaseParser.java**. I think it was an omission.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550878919



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );

Review comment:
       Even more than that. HTML5 requires so little escaping because we use UTF-8 now. I see little reason to replace one or two byte characters in UTF-8 with 5 bytes of entities. Insane. This has been done 10-15 years ago, but is obsolete now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-752015029


   Massive...will take a day or two


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r550874186



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -130,133 +184,98 @@ public void parse( Reader source, Sink sink )
      * @return HTML content generated by flexmark-java
      * @throws IOException passed through
      */
-    String toHtml( Reader source )
+    CharSequence toHtml( Reader source )
         throws IOException
     {
+        // Read the source
         String text = IOUtil.toString( source );
-        MutableDataHolder flexmarkOptions = PegdownOptionsAdapter.flexmarkOptions(
-                Extensions.ALL & ~( Extensions.HARDWRAPS | Extensions.ANCHORLINKS ) ).toMutable();
-        ArrayList<Extension> extensions = new ArrayList<>();
-        for ( Extension extension : flexmarkOptions.get( com.vladsch.flexmark.parser.Parser.EXTENSIONS ) )
-        {
-            extensions.add( extension );
-        }
-
-        extensions.add( FlexmarkDoxiaExtension.create() );
-        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, extensions );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_OPEN_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.HTML_BLOCK_CLOSE_TAG_EOL, false );
-        flexmarkOptions.set( HtmlRenderer.MAX_TRAILING_BLANK_LINES, -1 );
-
-        com.vladsch.flexmark.parser.Parser parser = com.vladsch.flexmark.parser.Parser.builder( flexmarkOptions )
-                .build();
-        HtmlRenderer renderer = HtmlRenderer.builder( flexmarkOptions )
-                                    .linkResolverFactory( new FlexmarkDoxiaLinkResolver.Factory() )
-                                    .build();
-
 
+        // Now, build the HTML document
         StringBuilder html = new StringBuilder( 1000 );
         html.append( "<html>" );
         html.append( "<head>" );
-        Pattern metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_SECTION, Pattern.MULTILINE );
-        Matcher metadataMatcher = metadataPattern.matcher( text );
+
+        // First, we interpret the "metadata" section of the document and add the corresponding HTML headers
+        Matcher metadataMatcher = METADATA_SECTION_PATTERN.matcher( text );
         boolean haveTitle = false;
         if ( metadataMatcher.find() )
         {
-            metadataPattern = Pattern.compile( MULTI_MARKDOWN_METADATA_ENTRY, Pattern.MULTILINE );
-            Matcher lineMatcher = metadataPattern.matcher( metadataMatcher.group( 1 ) );
-            boolean first = true;
-            while ( lineMatcher.find() )
+            Matcher entryMatcher = METADATA_ENTRY_PATTERN.matcher( metadataMatcher.group( 0 ) );
+            while ( entryMatcher.find() )
             {
-                String key = StringUtils.trimToEmpty( lineMatcher.group( 1 ) );
-                if ( first )
-                {
-                    boolean found = false;
-                    for ( String k : STANDARD_METADATA_KEYS )
-                    {
-                        if ( k.equalsIgnoreCase( key ) )
-                        {
-                            found = true;
-                            break;
-                        }
-                    }
-                    if ( !found )
-                    {
-                        break;
-                    }
-                    first = false;
-                }
-                String value = StringUtils.trimToEmpty( lineMatcher.group( 2 ) );
+                String key = entryMatcher.group( 1 );
+                String value = entryMatcher.group( 2 );
                 if ( "title".equalsIgnoreCase( key ) )
                 {
                     haveTitle = true;
                     html.append( "<title>" );
-                    html.append( StringEscapeUtils.escapeXml( value ) );
+                    html.append( HtmlTools.escapeHTML( value, false ) );

Review comment:
       I am sick and tired of the codebase we have. Hacks after hacks with escaping rules w/o any consistency. BUT you are right, lets keep it consistent even if it is wrong.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549810871



##########
File path: doxia-modules/doxia-module-markdown/src/it/settings.xml
##########
@@ -0,0 +1,55 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<settings>
+  <profiles>
+    <profile>
+      <id>it-repo</id>
+      <activation>
+        <activeByDefault>true</activeByDefault>
+      </activation>
+      <repositories>
+        <repository>
+          <id>1local.central</id>

Review comment:
       Leave as-is.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549798515



##########
File path: doxia-modules/doxia-module-markdown/pom.xml
##########
@@ -72,5 +72,80 @@ under the License.
       <groupId>org.codehaus.plexus</groupId>
       <artifactId>plexus-utils</artifactId>
     </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
   </dependencies>
+  <build>
+
+    <plugins>
+
+      <!-- install -->
+      <plugin>
+        <artifactId>maven-install-plugin</artifactId>

Review comment:
       I wanted to make sure `mvn verify` runs successfully on the reactor, without the need to run `mvn install` before.
   
   Also, the artifacts produced by other modules are not retrieved by **maven-invoker-plugin** (even with its `<extraArtifacts>` configuration property). That would be a nice enhancement on this plugin! ;-)
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-753483742


   > Tested, when I run the general IT one site is missing:
   > 
   > ```
   > 192.168.1.52 - - [02/Jan/2021 13:52:48] "GET /general/target/site/quotes.html HTTP/1.1" 404 -
   > ```
   
   Ha! Good catch! I forgot to remove the reference to **quotes.html** in **site.xml**, that's all. I'll fix this. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#issuecomment-753484119


   Reference to **quotes.html** has been removed. You shouldn't get a 404 for this now. Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] michael-o commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

michael-o commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549763825



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior

Review comment:
       i would favor move to CommonMark and move master to 1.10.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [maven-doxia] bertysentry commented on a change in pull request #49: [DOXIA-616]

Posted by GitBox <gi...@apache.org>.

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549815433



##########
File path: doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        "^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + "(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", "email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior
+        flexmarkOptions.setFrom( ParserEmulationProfile.PEGDOWN );
+
+        // Enable the extensions that we used to have in Pegdown
+        flexmarkOptions.set( com.vladsch.flexmark.parser.Parser.EXTENSIONS, Arrays.asList(
+                EscapedCharacterExtension.create(),
+                AbbreviationExtension.create(),
+                AutolinkExtension.create(),
+                DefinitionExtension.create(),
+                TypographicExtension.create(),
+                TablesExtension.create(),
+                WikiLinkExtension.create(),
+                StrikethroughExtension.create()
+        ) );
+
+        // Disable wrong apostrophe replacement
+        flexmarkOptions.set( TypographicExtension.SINGLE_QUOTE_UNMATCHED, "&apos;" );

Review comment:
       Understood!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org