You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by sa...@apache.org on 2014/01/10 08:29:34 UTC
svn commit: r1557046 [1/3] - in /lucene/dev/branches/branch_4x: ./ lucene/
lucene/analysis/
lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/
lucene/analysis/common/src/test/org/apache/lucene/analysis/core/
Author: sarowe
Date: Fri Jan 10 07:29:34 2014
New Revision: 1557046
URL: http://svn.apache.org/r1557046
Log:
LUCENE-5391: UAX29URLEmailTokenizer should not tokenize no-scheme domain-only URLs that are followed by an alphanumeric character (merged trunk r1557042)
Modified:
lucene/dev/branches/branch_4x/ (props changed)
lucene/dev/branches/branch_4x/lucene/ (props changed)
lucene/dev/branches/branch_4x/lucene/CHANGES.txt (contents, props changed)
lucene/dev/branches/branch_4x/lucene/analysis/ (props changed)
lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java
lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerImpl.java
lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerImpl.jflex
lucene/dev/branches/branch_4x/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestUAX29URLEmailAnalyzer.java
Modified: lucene/dev/branches/branch_4x/lucene/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/CHANGES.txt?rev=1557046&r1=1557045&r2=1557046&view=diff
==============================================================================
--- lucene/dev/branches/branch_4x/lucene/CHANGES.txt (original)
+++ lucene/dev/branches/branch_4x/lucene/CHANGES.txt Fri Jan 10 07:29:34 2014
@@ -86,6 +86,10 @@ Bug fixes
* LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter.
(Nik Everett via Adrien Grand)
+* LUCENE-5391: UAX29URLEmailTokenizer should not tokenize no-scheme
+ domain-only URLs that are followed by an alphanumeric character.
+ (Chris Geeringh, Steve Rowe)
+
API Changes
* LUCENE-5339: The facet module was simplified/reworked to make the
Modified: lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java?rev=1557046&r1=1557045&r2=1557046&view=diff
==============================================================================
--- lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java (original)
+++ lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java Fri Jan 10 07:29:34 2014
@@ -33,7 +33,7 @@ import org.apache.lucene.util.Version;
/**
* This class implements Word Break rules from the Unicode Text Segmentation
- * algorithm, as specified in
+ * algorithm, as specified in `
* <a href="http://unicode.org/reports/tr29/">Unicode Standard Annex #29</a>
* URLs and email addresses are also tokenized according to the relevant RFCs.
* <p/>