You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by sa...@apache.org on 2010/10/11 04:32:00 UTC
svn commit: r1021234 - in /lucene/dev/trunk/lucene/contrib/benchmark:
CHANGES.txt README.enwiki lib/xerces-2.9.1-patched-XERCESJ-1257.jar
lib/xercesImpl-2.10.0.jar lib/xml-apis-2.10.0.jar lib/xml-apis-2.9.0.jar
sortBench.py
Author: sarowe
Date: Mon Oct 11 02:31:59 2010
New Revision: 1021234
URL: http://svn.apache.org/viewvc?rev=1021234&view=rev
Log:
Upgraded xerces-2.9.1-patched-XERCESJ-1257.jar (committed as part of LUCENE-1591) to xercesImpl-2.10.0.jar (which contains the fix for XERCESJ-1257) and also upgraded xml-apis-2.9.0.jar to xml-apis-2.10.0.jar.
Added:
lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar (with props)
lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar (with props)
Removed:
lucene/dev/trunk/lucene/contrib/benchmark/lib/xerces-2.9.1-patched-XERCESJ-1257.jar
lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar
Modified:
lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
Modified: lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt?rev=1021234&r1=1021233&r2=1021234&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt (original)
+++ lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt Mon Oct 11 02:31:59 2010
@@ -2,6 +2,15 @@ Lucene Benchmark Contrib Change Log
The Benchmark contrib package contains code for benchmarking Lucene in a variety of ways.
+10/10/2010
+ The locally built patched version of the Xerces-J jar introduced
+ as part of LUCENE-1591 is no longer required, because Xerces
+ 2.10.0, which contains a fix for XERCESJ-1257 (see
+ http://svn.apache.org/viewvc?view=revision&revision=554069),
+ was released earlier this year. Upgraded
+ xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar
+ to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe)
+
8/2/2010
LUCENE-2582: You can now specify the default codec to use for
writing new segments by adding default.codec = Pulsing (for
Modified: lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki?rev=1021234&r1=1021233&r2=1021234&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki (original)
+++ lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki Mon Oct 11 02:31:59 2010
@@ -20,50 +20,3 @@ After that, ant enwiki should process th
test. Ant targets get-enwiki, expand-enwiki, and extract-enwiki can
also be used to download, decompress, and extract (to individual files
in work/enwiki) the dataset, respectively.
-
-NOTE: This bug in Xerces:
-
- https://issues.apache.org/jira/browse/XERCESJ-1257
-
-which is still present as of 2.9.1, causes an exception like this when
-processing Wikipedia's XML:
-
-Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
- at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
- at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
- at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
- at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
- at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
- at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
- at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
- at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
- at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
- at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
- at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:77)
- ... 1 more
-
-The original poster in the Xerces bug provided this patch:
-
---- UTF8Reader.java 2006-11-23 00:36:53.000000000 +0100
-+++ /home/rainman/lucene/xerces-2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java 2008-04-04 00:40:58.000000000 +0200
-@@ -534,6 +534,16 @@
- invalidByte(4, 4, b2);
- }
-
-+ // check if output buffer is large enough to hold 2 surrogate chars
-+ if( out + 1 >= offset + length ){
-+ fBuffer[0] = (byte)b0;
-+ fBuffer[1] = (byte)b1;
-+ fBuffer[2] = (byte)b2;
-+ fBuffer[3] = (byte)b3;
-+ fOffset = 4;
-+ return out - offset;
-+ }
-+
- // decode bytes into surrogate characters
- int uuuuu = ((b0 << 2) & 0x001C) | ((b1 >> 4) & 0x0003);
- if (uuuuu > 0x10) {
-
-which I've applied to Xerces 2.9.1 sources, and committed under
-lib/xerces-2.9.1-patched-XERCESJ-1257.jar. Once XERCESJ-1257 is fixed
-we can upgrade to a standard Xerces release.
Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar?rev=1021234&view=auto
==============================================================================
Binary file - no diff available.
Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar?rev=1021234&view=auto
==============================================================================
Binary file - no diff available.
Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Modified: lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py?rev=1021234&r1=1021233&r2=1021234&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py (original)
+++ lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py Mon Oct 11 02:31:59 2010
@@ -227,7 +227,7 @@ content.source=org.apache.lucene.benchma
print ' mkdir %s' % LOG_DIR
os.makedirs(LOG_DIR)
- command = '%s -classpath ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-collections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.9.0.jar:lib/xml-apis-2.9.0.jar:../../build/contrib/benchmark/classes/java org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' % (JAVA_COMMAND, algFile, fullLogFileName)
+ command = '%s -classpath ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-collections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.10.0.jar:lib/xml-apis-2.10.0.jar:../../build/contrib/benchmark/classes/java org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' % (JAVA_COMMAND, algFile, fullLogFileName)
if DEBUG:
print 'command=%s' % command
RE: svn commit: r1021234 - in /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar lib/xml-apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py
Posted by Uwe Schindler <uw...@thetaphi.de>.
Hah, thanks. Wanted to do this, too! :-)
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: sarowe@apache.org [mailto:sarowe@apache.org]
> Sent: Monday, October 11, 2010 4:32 AM
> To: commits@lucene.apache.org
> Subject: svn commit: r1021234 - in
> /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki
> lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar lib/xml-
> apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py
>
> Author: sarowe
> Date: Mon Oct 11 02:31:59 2010
> New Revision: 1021234
>
> URL: http://svn.apache.org/viewvc?rev=1021234&view=rev
> Log:
> Upgraded xerces-2.9.1-patched-XERCESJ-1257.jar (committed as part of
> LUCENE-1591) to xercesImpl-2.10.0.jar (which contains the fix for XERCESJ-
> 1257) and also upgraded xml-apis-2.9.0.jar to xml-apis-2.10.0.jar.
>
> Added:
> lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar (with
> props)
> lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar (with
> props)
> Removed:
> lucene/dev/trunk/lucene/contrib/benchmark/lib/xerces-2.9.1-patched-
> XERCESJ-1257.jar
> lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar
> Modified:
> lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
> lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
> lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
>
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/CH
> ANGES.txt?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt Mon Oct 11
> +++ 02:31:59 2010
> @@ -2,6 +2,15 @@ Lucene Benchmark Contrib Change Log
>
> The Benchmark contrib package contains code for benchmarking Lucene in a
> variety of ways.
>
> +10/10/2010
> + The locally built patched version of the Xerces-J jar introduced
> + as part of LUCENE-1591 is no longer required, because Xerces
> + 2.10.0, which contains a fix for XERCESJ-1257 (see
> + http://svn.apache.org/viewvc?view=revision&revision=554069),
> + was released earlier this year. Upgraded
> + xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar
> + to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe)
> +
> 8/2/2010
> LUCENE-2582: You can now specify the default codec to use for
> writing new segments by adding default.codec = Pulsing (for
>
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/RE
> ADME.enwiki?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki Mon Oct 11
> +++ 02:31:59 2010
> @@ -20,50 +20,3 @@ After that, ant enwiki should process th test. Ant targets
> get-enwiki, expand-enwiki, and extract-enwiki can also be used to download,
> decompress, and extract (to individual files in work/enwiki) the dataset,
> respectively.
> -
> -NOTE: This bug in Xerces:
> -
> - https://issues.apache.org/jira/browse/XERCESJ-1257
> -
> -which is still present as of 2.9.1, causes an exception like this when -processing
> Wikipedia's XML:
> -
> -Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException:
> Invalid byte 2 of 4-byte UTF-8 sequence.
> - at
> org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknow
> n Source)
> - at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown
> Source)
> - at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
> Source)
> - at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
> Source)
> - at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
> Dispatcher.dispatch(Unknown Source)
> - at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
> known Source)
> - at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
> - at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
> - at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> - at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> - at
> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(Enwik
> iDocMaker.java:77)
> - ... 1 more
> -
> -The original poster in the Xerces bug provided this patch:
> -
> ---- UTF8Reader.java 2006-11-23 00:36:53.000000000 +0100
> -+++ /home/rainman/lucene/xerces-
> 2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java 2008-04-04
> 00:40:58.000000000 +0200
> -@@ -534,6 +534,16 @@
> - invalidByte(4, 4, b2);
> - }
> -
> -+ // check if output buffer is large enough to hold 2 surrogate chars
> -+ if( out + 1 >= offset + length ){
> -+ fBuffer[0] = (byte)b0;
> -+ fBuffer[1] = (byte)b1;
> -+ fBuffer[2] = (byte)b2;
> -+ fBuffer[3] = (byte)b3;
> -+ fOffset = 4;
> -+ return out - offset;
> -+ }
> -+
> - // decode bytes into surrogate characters
> - int uuuuu = ((b0 << 2) & 0x001C) | ((b1 >> 4) & 0x0003);
> - if (uuuuu > 0x10) {
> -
> -which I've applied to Xerces 2.9.1 sources, and committed under -lib/xerces-
> 2.9.1-patched-XERCESJ-1257.jar. Once XERCESJ-1257 is fixed -we can upgrade
> to a standard Xerces release.
>
> Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/
> xercesImpl-2.10.0.jar?rev=1021234&view=auto
> ================================================================
> ==============
> Binary file - no diff available.
>
> Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-
> 2.10.0.jar
> ------------------------------------------------------------------------------
> svn:mime-type = application/octet-stream
>
> Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/
> xml-apis-2.10.0.jar?rev=1021234&view=auto
> ================================================================
> ==============
> Binary file - no diff available.
>
> Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-
> 2.10.0.jar
> ------------------------------------------------------------------------------
> svn:mime-type = application/octet-stream
>
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/sort
> Bench.py?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py Mon Oct 11
> +++ 02:31:59 2010
> @@ -227,7 +227,7 @@ content.source=org.apache.lucene.benchma
> print ' mkdir %s' % LOG_DIR
> os.makedirs(LOG_DIR)
>
> - command = '%s -classpath
> ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/cl
> asses/java:lib/commons-digester-1.7.jar:lib/commons-collections-
> 3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-
> 1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.9.0.jar:lib/xml-apis-
> 2.9.0.jar:../../build/contrib/benchmark/classes/java
> org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' %
> (JAVA_COMMAND, algFile, fullLogFileName)
> + command = '%s -classpath
> + ../../build/classes/java:../../build/classes/demo:../../build/contrib/
> + highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-coll
> + ections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4
> + .jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.10.0.jar:lib/xml-api
> + s-2.10.0.jar:../../build/contrib/benchmark/classes/java
> + org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' %
> + (JAVA_COMMAND, algFile, fullLogFileName)
>
> if DEBUG:
> print 'command=%s' % command
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org