You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Prokopis Prokopidis (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 14:59:24 UTC

[jira] [Created] (OPENNLP-489) endMarker never checked when parsing wikinews page

endMarker never checked when parsing wikinews page
--------------------------------------------------

                 Key: OPENNLP-489
                 URL: https://issues.apache.org/jira/browse/OPENNLP-489
             Project: OpenNLP
          Issue Type: Bug
          Components: Wikinews Importer
            Reporter: Prokopis Prokopidis
            Priority: Minor


Hi,

I am testing the Wikinews Importer, thanks for making it available.

I think that in the following code of WikinewsConverter.java

int cutIndex = -1;

for (String endMarker : endOfArtilceMarkers) {
  int endMarkerIndex = pageText.indexOf(endMarker);
  if (endMarkerIndex != -1) {
    cutIndex = endMarkerIndex;
    break;
  }
}
         
if (cutIndex == -1)
  cutIndex = pageText.length();

if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like

int cutIndex = pageText.length();

for (String endMarker : endOfArtilceMarkers) {
  int endMarkerIndex = pageText.indexOf(endMarker);
  if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
    cutIndex = endMarkerIndex;
    }
}
if (cutIndex < pageText.length()) {
  pageText = pageText.substring(0, cutIndex);
}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-489) endMarker never checked when parsing wikinews page

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244286#comment-13244286 ] 

Joern Kottmann commented on OPENNLP-489:
----------------------------------------

You create a .patch file. Many name these files after the jira issue. In this e.g. OPENNLP-489.patch (first patch). If multiple for one issue are created many just number them like this OPENLP-489-2.patch.
The file contains simply the content you posted at the end of the comment. if you used the patch command, just redirect the output to a file.
Or eclipse can write the patch directly to a file.

I don't get the patch in the comment applied, can you attach a patch file instead please?

                
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
>                 Key: OPENNLP-489
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-489
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Wikinews Importer
>            Reporter: Prokopis Prokopidis
>            Priority: Minor
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1) {
>     cutIndex = endMarkerIndex;
>     break;
>   }
> }
>          
> if (cutIndex == -1)
>   cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
>     cutIndex = endMarkerIndex;
>     }
> }
> if (cutIndex < pageText.length()) {
>   pageText = pageText.substring(0, cutIndex);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-489) endMarker never checked when parsing wikinews page

Posted by "Prokopis Prokopidis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244259#comment-13244259 ] 

Prokopis Prokopidis commented on OPENNLP-489:
---------------------------------------------

Hi Joern.

If you decide to accept this as a correction, can you please check the 
following patch? I've never created one for an Apache project :-). Do I 
just paste it as a new comment in Jira?

Best,

P.

Index: WikinewsConverter.java
===================================================================
--- WikinewsConverter.java    (revision 1308376)
+++ WikinewsConverter.java    (working copy)
@@ -86,23 +86,19 @@
          if (page.getText().toLowerCase().contains("{publish}")) {

            String pageText = page.getText();
-
-          int cutIndex = -1;
-
+
+          int cutIndex = pageText.length();
+
            for (String endMarker : endOfArtilceMarkers) {
-
              int endMarkerIndex = pageText.indexOf(endMarker);
-            if (endMarkerIndex != -1) {
+            if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
                cutIndex = endMarkerIndex;
-              break;
-            }
+              }
            }
+          if (cutIndex < pageText.length()) {
+            pageText = pageText.substring(0, cutIndex);
+          }

-          if (cutIndex == -1)
-            cutIndex = pageText.length();
-
-          pageText = pageText.substring(0, cutIndex);
-
            WikinewsWikiModel wikiModel = new 
WikinewsWikiModel("http://en.wikinews.org/wiki/${image}",
"http://en.wikinews.org/wiki/${title}");




                
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
>                 Key: OPENNLP-489
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-489
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Wikinews Importer
>            Reporter: Prokopis Prokopidis
>            Priority: Minor
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1) {
>     cutIndex = endMarkerIndex;
>     break;
>   }
> }
>          
> if (cutIndex == -1)
>   cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
>     cutIndex = endMarkerIndex;
>     }
> }
> if (cutIndex < pageText.length()) {
>   pageText = pageText.substring(0, cutIndex);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-489) endMarker never checked when parsing wikinews page

Posted by "Prokopis Prokopidis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prokopis Prokopidis updated OPENNLP-489:
----------------------------------------

    Attachment: OPENNLP-489.patch

Joern, hope this is OK now.
                
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
>                 Key: OPENNLP-489
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-489
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Wikinews Importer
>            Reporter: Prokopis Prokopidis
>            Priority: Minor
>         Attachments: OPENNLP-489.patch
>
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1) {
>     cutIndex = endMarkerIndex;
>     break;
>   }
> }
>          
> if (cutIndex == -1)
>   cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
>     cutIndex = endMarkerIndex;
>     }
> }
> if (cutIndex < pageText.length()) {
>   pageText = pageText.substring(0, cutIndex);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OPENNLP-489) endMarker never checked when parsing wikinews page

Posted by "Joern Kottmann (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann resolved OPENNLP-489.
------------------------------------

    Resolution: Fixed
      Assignee: Joern Kottmann

Patch is applied now, thanks for fixing this.
                
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
>                 Key: OPENNLP-489
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-489
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Wikinews Importer
>            Reporter: Prokopis Prokopidis
>            Assignee: Joern Kottmann
>            Priority: Minor
>         Attachments: OPENNLP-489.patch
>
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1) {
>     cutIndex = endMarkerIndex;
>     break;
>   }
> }
>          
> if (cutIndex == -1)
>   cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
>     cutIndex = endMarkerIndex;
>     }
> }
> if (cutIndex < pageText.length()) {
>   pageText = pageText.substring(0, cutIndex);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (OPENNLP-489) endMarker never checked when parsing wikinews page

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann closed OPENNLP-489.
----------------------------------

    
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
>                 Key: OPENNLP-489
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-489
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Wikinews Importer
>            Reporter: Prokopis Prokopidis
>            Assignee: Joern Kottmann
>            Priority: Minor
>         Attachments: OPENNLP-489.patch
>
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1) {
>     cutIndex = endMarkerIndex;
>     break;
>   }
> }
>          
> if (cutIndex == -1)
>   cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
>     cutIndex = endMarkerIndex;
>     }
> }
> if (cutIndex < pageText.length()) {
>   pageText = pageText.substring(0, cutIndex);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-489) endMarker never checked when parsing wikinews page

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244159#comment-13244159 ] 

Joern Kottmann commented on OPENNLP-489:
----------------------------------------

Thanks for reporting this.

Can you provide us with a link to an article where this happens?
Would you mind to make a patch?
                
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
>                 Key: OPENNLP-489
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-489
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Wikinews Importer
>            Reporter: Prokopis Prokopidis
>            Priority: Minor
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1) {
>     cutIndex = endMarkerIndex;
>     break;
>   }
> }
>          
> if (cutIndex == -1)
>   cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
>     cutIndex = endMarkerIndex;
>     }
> }
> if (cutIndex < pageText.length()) {
>   pageText = pageText.substring(0, cutIndex);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-489) endMarker never checked when parsing wikinews page

Posted by "Prokopis Prokopidis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244229#comment-13244229 ] 

Prokopis Prokopidis commented on OPENNLP-489:
---------------------------------------------

Check this article: http://en.wikinews.org/wiki/NYC_Transit_asks_members_to_ratify_new_contracts. There's no {{haveyoursay}} like in all(?) recent articles.

If we want to add "==See also==" to the endOfArticleMarkers list, we 'll have to be sure that we add it before "==Sources==" in the list. Otherwise it will never get the chance to be checked because "==Sources==" are checked first and the break is reached.
                
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
>                 Key: OPENNLP-489
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-489
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Wikinews Importer
>            Reporter: Prokopis Prokopidis
>            Priority: Minor
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1) {
>     cutIndex = endMarkerIndex;
>     break;
>   }
> }
>          
> if (cutIndex == -1)
>   cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
>   int endMarkerIndex = pageText.indexOf(endMarker);
>   if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
>     cutIndex = endMarkerIndex;
>     }
> }
> if (cutIndex < pageText.length()) {
>   pageText = pageText.substring(0, cutIndex);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira