You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by sa...@apache.org on 2011/01/18 16:13:34 UTC

svn commit: r1060407 - in /lucene/dev/branches/branch_3x: dev-tools/eclipse/ lucene/contrib/benchmark/ lucene/contrib/benchmark/lib/

Author: sarowe
Date: Tue Jan 18 15:13:34 2011
New Revision: 1060407

URL: http://svn.apache.org/viewvc?rev=1060407&view=rev
Log:
Upgraded xerces-2.9.1-patched-XERCESJ-1257.jar (committed as part of LUCENE-1591) to xercesImpl-2.10.0.jar (which contains the fix for XERCESJ-1257) and also upgraded xml-apis-2.9.0.jar to xml-apis-2.10.0.jar.

Added:
    lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar   (with props)
    lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar   (with props)
Removed:
    lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xerces-2.9.1-patched-XERCESJ-1257.jar
    lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar
Modified:
    lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath
    lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt
    lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki

Modified: lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath?rev=1060407&r1=1060406&r2=1060407&view=diff
==============================================================================
--- lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath (original)
+++ lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath Tue Jan 18 15:13:34 2011
@@ -82,8 +82,8 @@
 	<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/commons-compress-1.0.jar"/>
 	<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/commons-digester-1.7.jar"/>
 	<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/commons-logging-1.0.4.jar"/>
-	<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/xerces-2.9.1-patched-XERCESJ-1257.jar"/>
-	<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar"/>
+	<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/xerces-2.10.0.jar"/>
+	<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar"/>
 	<classpathentry kind="lib" path="lucene/contrib/db/bdb/lib/db-4.7.25.jar"/>
 	<classpathentry kind="lib" path="lucene/contrib/db/bdb-je/lib/je-3.3.93.jar"/>
 	<classpathentry kind="lib" path="lucene/contrib/icu/lib/icu4j-4_6.jar"/>

Modified: lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt?rev=1060407&r1=1060406&r2=1060407&view=diff
==============================================================================
--- lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt (original)
+++ lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt Tue Jan 18 15:13:34 2011
@@ -2,6 +2,15 @@ Lucene Benchmark Contrib Change Log
 
 The Benchmark contrib package contains code for benchmarking Lucene in a variety of ways.
 
+1/18/2010
+  The locally built patched version of the Xerces-J jar introduced
+  as part of LUCENE-1591 is no longer required, because Xerces
+  2.10.0, which contains a fix for XERCESJ-1257 (see
+  http://svn.apache.org/viewvc?view=revision&revision=554069),
+  was released last year.  Upgraded
+  xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar
+  to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe)
+
 4/27/2010
   LUCENE-2416: WriteLineDocTask now supports multi-threading. Also, 
   StringBufferReader was renamed to StringBuilderReader and works on 

Modified: lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki?rev=1060407&r1=1060406&r2=1060407&view=diff
==============================================================================
--- lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki (original)
+++ lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki Tue Jan 18 15:13:34 2011
@@ -20,50 +20,3 @@ After that, ant enwiki should process th
 test. Ant targets get-enwiki, expand-enwiki, and extract-enwiki can
 also be used to download, decompress, and extract (to individual files
 in work/enwiki) the dataset, respectively.
-
-NOTE: This bug in Xerces:
-
-  https://issues.apache.org/jira/browse/XERCESJ-1257
-
-which is still present as of 2.9.1, causes an exception like this when
-processing Wikipedia's XML:
-
-Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
-	at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
-	at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
-	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
-	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
-	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
-	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
-	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
-	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
-	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
-	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
-	at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:77)
-	... 1 more
-
-The original poster in the Xerces bug provided this patch:
-
---- UTF8Reader.java	2006-11-23 00:36:53.000000000 +0100
-+++ /home/rainman/lucene/xerces-2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java	2008-04-04 00:40:58.000000000 +0200
-@@ -534,6 +534,16 @@
-                     invalidByte(4, 4, b2);
-                 }
- 
-+                // check if output buffer is large enough to hold 2 surrogate chars
-+                if( out + 1 >= offset + length ){
-+                    fBuffer[0] = (byte)b0;
-+                    fBuffer[1] = (byte)b1;
-+                    fBuffer[2] = (byte)b2;
-+                    fBuffer[3] = (byte)b3;
-+                    fOffset = 4;
-+                    return out - offset;
-+		}
-+
-                 // decode bytes into surrogate characters
-                 int uuuuu = ((b0 << 2) & 0x001C) | ((b1 >> 4) & 0x0003);
-                 if (uuuuu > 0x10) {
-
-which I've applied to Xerces 2.9.1 sources, and committed under
-lib/xerces-2.9.1-patched-XERCESJ-1257.jar.  Once XERCESJ-1257 is fixed
-we can upgrade to a standard Xerces release.

Added: lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar?rev=1060407&view=auto
==============================================================================
Binary file - no diff available.

Added: lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar?rev=1060407&view=auto
==============================================================================
Binary file - no diff available.