You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by sa...@apache.org on 2011/01/18 16:13:34 UTC
svn commit: r1060407 - in /lucene/dev/branches/branch_3x: dev-tools/eclipse/
lucene/contrib/benchmark/ lucene/contrib/benchmark/lib/
Author: sarowe
Date: Tue Jan 18 15:13:34 2011
New Revision: 1060407
URL: http://svn.apache.org/viewvc?rev=1060407&view=rev
Log:
Upgraded xerces-2.9.1-patched-XERCESJ-1257.jar (committed as part of LUCENE-1591) to xercesImpl-2.10.0.jar (which contains the fix for XERCESJ-1257) and also upgraded xml-apis-2.9.0.jar to xml-apis-2.10.0.jar.
Added:
lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar (with props)
lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar (with props)
Removed:
lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xerces-2.9.1-patched-XERCESJ-1257.jar
lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar
Modified:
lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath
lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt
lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki
Modified: lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath?rev=1060407&r1=1060406&r2=1060407&view=diff
==============================================================================
--- lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath (original)
+++ lucene/dev/branches/branch_3x/dev-tools/eclipse/dot.classpath Tue Jan 18 15:13:34 2011
@@ -82,8 +82,8 @@
<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/commons-compress-1.0.jar"/>
<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/commons-digester-1.7.jar"/>
<classpathentry kind="lib" path="lucene/contrib/benchmark/lib/commons-logging-1.0.4.jar"/>
- <classpathentry kind="lib" path="lucene/contrib/benchmark/lib/xerces-2.9.1-patched-XERCESJ-1257.jar"/>
- <classpathentry kind="lib" path="lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar"/>
+ <classpathentry kind="lib" path="lucene/contrib/benchmark/lib/xerces-2.10.0.jar"/>
+ <classpathentry kind="lib" path="lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar"/>
<classpathentry kind="lib" path="lucene/contrib/db/bdb/lib/db-4.7.25.jar"/>
<classpathentry kind="lib" path="lucene/contrib/db/bdb-je/lib/je-3.3.93.jar"/>
<classpathentry kind="lib" path="lucene/contrib/icu/lib/icu4j-4_6.jar"/>
Modified: lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt?rev=1060407&r1=1060406&r2=1060407&view=diff
==============================================================================
--- lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt (original)
+++ lucene/dev/branches/branch_3x/lucene/contrib/benchmark/CHANGES.txt Tue Jan 18 15:13:34 2011
@@ -2,6 +2,15 @@ Lucene Benchmark Contrib Change Log
The Benchmark contrib package contains code for benchmarking Lucene in a variety of ways.
+1/18/2010
+ The locally built patched version of the Xerces-J jar introduced
+ as part of LUCENE-1591 is no longer required, because Xerces
+ 2.10.0, which contains a fix for XERCESJ-1257 (see
+ http://svn.apache.org/viewvc?view=revision&revision=554069),
+ was released last year. Upgraded
+ xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar
+ to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe)
+
4/27/2010
LUCENE-2416: WriteLineDocTask now supports multi-threading. Also,
StringBufferReader was renamed to StringBuilderReader and works on
Modified: lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki?rev=1060407&r1=1060406&r2=1060407&view=diff
==============================================================================
--- lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki (original)
+++ lucene/dev/branches/branch_3x/lucene/contrib/benchmark/README.enwiki Tue Jan 18 15:13:34 2011
@@ -20,50 +20,3 @@ After that, ant enwiki should process th
test. Ant targets get-enwiki, expand-enwiki, and extract-enwiki can
also be used to download, decompress, and extract (to individual files
in work/enwiki) the dataset, respectively.
-
-NOTE: This bug in Xerces:
-
- https://issues.apache.org/jira/browse/XERCESJ-1257
-
-which is still present as of 2.9.1, causes an exception like this when
-processing Wikipedia's XML:
-
-Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
- at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
- at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
- at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
- at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
- at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
- at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
- at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
- at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
- at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
- at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
- at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:77)
- ... 1 more
-
-The original poster in the Xerces bug provided this patch:
-
---- UTF8Reader.java 2006-11-23 00:36:53.000000000 +0100
-+++ /home/rainman/lucene/xerces-2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java 2008-04-04 00:40:58.000000000 +0200
-@@ -534,6 +534,16 @@
- invalidByte(4, 4, b2);
- }
-
-+ // check if output buffer is large enough to hold 2 surrogate chars
-+ if( out + 1 >= offset + length ){
-+ fBuffer[0] = (byte)b0;
-+ fBuffer[1] = (byte)b1;
-+ fBuffer[2] = (byte)b2;
-+ fBuffer[3] = (byte)b3;
-+ fOffset = 4;
-+ return out - offset;
-+ }
-+
- // decode bytes into surrogate characters
- int uuuuu = ((b0 << 2) & 0x001C) | ((b1 >> 4) & 0x0003);
- if (uuuuu > 0x10) {
-
-which I've applied to Xerces 2.9.1 sources, and committed under
-lib/xerces-2.9.1-patched-XERCESJ-1257.jar. Once XERCESJ-1257 is fixed
-we can upgrade to a standard Xerces release.
Added: lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar?rev=1060407&view=auto
==============================================================================
Binary file - no diff available.
Added: lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar?rev=1060407&view=auto
==============================================================================
Binary file - no diff available.