You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Prokopis Prokopidis (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 14:59:24 UTC
[jira] [Created] (OPENNLP-489) endMarker never checked when parsing
wikinews page
endMarker never checked when parsing wikinews page
--------------------------------------------------
Key: OPENNLP-489
URL: https://issues.apache.org/jira/browse/OPENNLP-489
Project: OpenNLP
Issue Type: Bug
Components: Wikinews Importer
Reporter: Prokopis Prokopidis
Priority: Minor
Hi,
I am testing the Wikinews Importer, thanks for making it available.
I think that in the following code of WikinewsConverter.java
int cutIndex = -1;
for (String endMarker : endOfArtilceMarkers) {
int endMarkerIndex = pageText.indexOf(endMarker);
if (endMarkerIndex != -1) {
cutIndex = endMarkerIndex;
break;
}
}
if (cutIndex == -1)
cutIndex = pageText.length();
if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
int cutIndex = pageText.length();
for (String endMarker : endOfArtilceMarkers) {
int endMarkerIndex = pageText.indexOf(endMarker);
if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
cutIndex = endMarkerIndex;
}
}
if (cutIndex < pageText.length()) {
pageText = pageText.substring(0, cutIndex);
}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-489) endMarker never checked when
parsing wikinews page
Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244286#comment-13244286 ]
Joern Kottmann commented on OPENNLP-489:
----------------------------------------
You create a .patch file. Many name these files after the jira issue. In this e.g. OPENNLP-489.patch (first patch). If multiple for one issue are created many just number them like this OPENLP-489-2.patch.
The file contains simply the content you posted at the end of the comment. if you used the patch command, just redirect the output to a file.
Or eclipse can write the patch directly to a file.
I don't get the patch in the comment applied, can you attach a patch file instead please?
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
> Key: OPENNLP-489
> URL: https://issues.apache.org/jira/browse/OPENNLP-489
> Project: OpenNLP
> Issue Type: Bug
> Components: Wikinews Importer
> Reporter: Prokopis Prokopidis
> Priority: Minor
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1) {
> cutIndex = endMarkerIndex;
> break;
> }
> }
>
> if (cutIndex == -1)
> cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
> cutIndex = endMarkerIndex;
> }
> }
> if (cutIndex < pageText.length()) {
> pageText = pageText.substring(0, cutIndex);
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-489) endMarker never checked when
parsing wikinews page
Posted by "Prokopis Prokopidis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244259#comment-13244259 ]
Prokopis Prokopidis commented on OPENNLP-489:
---------------------------------------------
Hi Joern.
If you decide to accept this as a correction, can you please check the
following patch? I've never created one for an Apache project :-). Do I
just paste it as a new comment in Jira?
Best,
P.
Index: WikinewsConverter.java
===================================================================
--- WikinewsConverter.java (revision 1308376)
+++ WikinewsConverter.java (working copy)
@@ -86,23 +86,19 @@
if (page.getText().toLowerCase().contains("{publish}")) {
String pageText = page.getText();
-
- int cutIndex = -1;
-
+
+ int cutIndex = pageText.length();
+
for (String endMarker : endOfArtilceMarkers) {
-
int endMarkerIndex = pageText.indexOf(endMarker);
- if (endMarkerIndex != -1) {
+ if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
cutIndex = endMarkerIndex;
- break;
- }
+ }
}
+ if (cutIndex < pageText.length()) {
+ pageText = pageText.substring(0, cutIndex);
+ }
- if (cutIndex == -1)
- cutIndex = pageText.length();
-
- pageText = pageText.substring(0, cutIndex);
-
WikinewsWikiModel wikiModel = new
WikinewsWikiModel("http://en.wikinews.org/wiki/${image}",
"http://en.wikinews.org/wiki/${title}");
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
> Key: OPENNLP-489
> URL: https://issues.apache.org/jira/browse/OPENNLP-489
> Project: OpenNLP
> Issue Type: Bug
> Components: Wikinews Importer
> Reporter: Prokopis Prokopidis
> Priority: Minor
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1) {
> cutIndex = endMarkerIndex;
> break;
> }
> }
>
> if (cutIndex == -1)
> cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
> cutIndex = endMarkerIndex;
> }
> }
> if (cutIndex < pageText.length()) {
> pageText = pageText.substring(0, cutIndex);
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OPENNLP-489) endMarker never checked when parsing
wikinews page
Posted by "Prokopis Prokopidis (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prokopis Prokopidis updated OPENNLP-489:
----------------------------------------
Attachment: OPENNLP-489.patch
Joern, hope this is OK now.
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
> Key: OPENNLP-489
> URL: https://issues.apache.org/jira/browse/OPENNLP-489
> Project: OpenNLP
> Issue Type: Bug
> Components: Wikinews Importer
> Reporter: Prokopis Prokopidis
> Priority: Minor
> Attachments: OPENNLP-489.patch
>
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1) {
> cutIndex = endMarkerIndex;
> break;
> }
> }
>
> if (cutIndex == -1)
> cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
> cutIndex = endMarkerIndex;
> }
> }
> if (cutIndex < pageText.length()) {
> pageText = pageText.substring(0, cutIndex);
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (OPENNLP-489) endMarker never checked when
parsing wikinews page
Posted by "Joern Kottmann (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joern Kottmann resolved OPENNLP-489.
------------------------------------
Resolution: Fixed
Assignee: Joern Kottmann
Patch is applied now, thanks for fixing this.
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
> Key: OPENNLP-489
> URL: https://issues.apache.org/jira/browse/OPENNLP-489
> Project: OpenNLP
> Issue Type: Bug
> Components: Wikinews Importer
> Reporter: Prokopis Prokopidis
> Assignee: Joern Kottmann
> Priority: Minor
> Attachments: OPENNLP-489.patch
>
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1) {
> cutIndex = endMarkerIndex;
> break;
> }
> }
>
> if (cutIndex == -1)
> cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
> cutIndex = endMarkerIndex;
> }
> }
> if (cutIndex < pageText.length()) {
> pageText = pageText.substring(0, cutIndex);
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (OPENNLP-489) endMarker never checked when parsing
wikinews page
Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joern Kottmann closed OPENNLP-489.
----------------------------------
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
> Key: OPENNLP-489
> URL: https://issues.apache.org/jira/browse/OPENNLP-489
> Project: OpenNLP
> Issue Type: Bug
> Components: Wikinews Importer
> Reporter: Prokopis Prokopidis
> Assignee: Joern Kottmann
> Priority: Minor
> Attachments: OPENNLP-489.patch
>
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1) {
> cutIndex = endMarkerIndex;
> break;
> }
> }
>
> if (cutIndex == -1)
> cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
> cutIndex = endMarkerIndex;
> }
> }
> if (cutIndex < pageText.length()) {
> pageText = pageText.substring(0, cutIndex);
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-489) endMarker never checked when
parsing wikinews page
Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244159#comment-13244159 ]
Joern Kottmann commented on OPENNLP-489:
----------------------------------------
Thanks for reporting this.
Can you provide us with a link to an article where this happens?
Would you mind to make a patch?
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
> Key: OPENNLP-489
> URL: https://issues.apache.org/jira/browse/OPENNLP-489
> Project: OpenNLP
> Issue Type: Bug
> Components: Wikinews Importer
> Reporter: Prokopis Prokopidis
> Priority: Minor
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1) {
> cutIndex = endMarkerIndex;
> break;
> }
> }
>
> if (cutIndex == -1)
> cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
> cutIndex = endMarkerIndex;
> }
> }
> if (cutIndex < pageText.length()) {
> pageText = pageText.substring(0, cutIndex);
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-489) endMarker never checked when
parsing wikinews page
Posted by "Prokopis Prokopidis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244229#comment-13244229 ]
Prokopis Prokopidis commented on OPENNLP-489:
---------------------------------------------
Check this article: http://en.wikinews.org/wiki/NYC_Transit_asks_members_to_ratify_new_contracts. There's no {{haveyoursay}} like in all(?) recent articles.
If we want to add "==See also==" to the endOfArticleMarkers list, we 'll have to be sure that we add it before "==Sources==" in the list. Otherwise it will never get the chance to be checked because "==Sources==" are checked first and the break is reached.
> endMarker never checked when parsing wikinews page
> --------------------------------------------------
>
> Key: OPENNLP-489
> URL: https://issues.apache.org/jira/browse/OPENNLP-489
> Project: OpenNLP
> Issue Type: Bug
> Components: Wikinews Importer
> Reporter: Prokopis Prokopidis
> Priority: Minor
>
> Hi,
> I am testing the Wikinews Importer, thanks for making it available.
> I think that in the following code of WikinewsConverter.java
> int cutIndex = -1;
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1) {
> cutIndex = endMarkerIndex;
> break;
> }
> }
>
> if (cutIndex == -1)
> cutIndex = pageText.length();
> if an endMarker1 has already been detected, another endMarker2 from the endOfArtilceMarkers list will not be checked, even if it appears before endMarker1 in the wiki text. Perhaps this check can be rewritten like
> int cutIndex = pageText.length();
> for (String endMarker : endOfArtilceMarkers) {
> int endMarkerIndex = pageText.indexOf(endMarker);
> if (endMarkerIndex != -1 && endMarkerIndex < cutIndex) {
> cutIndex = endMarkerIndex;
> }
> }
> if (cutIndex < pageText.length()) {
> pageText = pageText.substring(0, cutIndex);
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira