You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by sa...@apache.org on 2010/10/11 04:32:00 UTC

svn commit: r1021234 - in /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar lib/xml-apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py

Author: sarowe
Date: Mon Oct 11 02:31:59 2010
New Revision: 1021234

URL: http://svn.apache.org/viewvc?rev=1021234&view=rev
Log:
Upgraded xerces-2.9.1-patched-XERCESJ-1257.jar (committed as part of LUCENE-1591) to xercesImpl-2.10.0.jar (which contains the fix for XERCESJ-1257) and also upgraded xml-apis-2.9.0.jar to xml-apis-2.10.0.jar.

Added:
    lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar   (with props)
    lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar   (with props)
Removed:
    lucene/dev/trunk/lucene/contrib/benchmark/lib/xerces-2.9.1-patched-XERCESJ-1257.jar
    lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar
Modified:
    lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
    lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
    lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py

Modified: lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt?rev=1021234&r1=1021233&r2=1021234&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt (original)
+++ lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt Mon Oct 11 02:31:59 2010
@@ -2,6 +2,15 @@ Lucene Benchmark Contrib Change Log
 
 The Benchmark contrib package contains code for benchmarking Lucene in a variety of ways.
 
+10/10/2010
+  The locally built patched version of the Xerces-J jar introduced
+  as part of LUCENE-1591 is no longer required, because Xerces
+  2.10.0, which contains a fix for XERCESJ-1257 (see
+  http://svn.apache.org/viewvc?view=revision&revision=554069),
+  was released earlier this year.  Upgraded
+  xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar
+  to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe)
+
 8/2/2010
   LUCENE-2582: You can now specify the default codec to use for
   writing new segments by adding default.codec = Pulsing (for

Modified: lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki?rev=1021234&r1=1021233&r2=1021234&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki (original)
+++ lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki Mon Oct 11 02:31:59 2010
@@ -20,50 +20,3 @@ After that, ant enwiki should process th
 test. Ant targets get-enwiki, expand-enwiki, and extract-enwiki can
 also be used to download, decompress, and extract (to individual files
 in work/enwiki) the dataset, respectively.
-
-NOTE: This bug in Xerces:
-
-  https://issues.apache.org/jira/browse/XERCESJ-1257
-
-which is still present as of 2.9.1, causes an exception like this when
-processing Wikipedia's XML:
-
-Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
-	at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
-	at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
-	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
-	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
-	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
-	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
-	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
-	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
-	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
-	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
-	at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:77)
-	... 1 more
-
-The original poster in the Xerces bug provided this patch:
-
---- UTF8Reader.java	2006-11-23 00:36:53.000000000 +0100
-+++ /home/rainman/lucene/xerces-2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java	2008-04-04 00:40:58.000000000 +0200
-@@ -534,6 +534,16 @@
-                     invalidByte(4, 4, b2);
-                 }
- 
-+                // check if output buffer is large enough to hold 2 surrogate chars
-+                if( out + 1 >= offset + length ){
-+                    fBuffer[0] = (byte)b0;
-+                    fBuffer[1] = (byte)b1;
-+                    fBuffer[2] = (byte)b2;
-+                    fBuffer[3] = (byte)b3;
-+                    fOffset = 4;
-+                    return out - offset;
-+		}
-+
-                 // decode bytes into surrogate characters
-                 int uuuuu = ((b0 << 2) & 0x001C) | ((b1 >> 4) & 0x0003);
-                 if (uuuuu > 0x10) {
-
-which I've applied to Xerces 2.9.1 sources, and committed under
-lib/xerces-2.9.1-patched-XERCESJ-1257.jar.  Once XERCESJ-1257 is fixed
-we can upgrade to a standard Xerces release.

Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar?rev=1021234&view=auto
==============================================================================
Binary file - no diff available.

Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar?rev=1021234&view=auto
==============================================================================
Binary file - no diff available.

Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py?rev=1021234&r1=1021233&r2=1021234&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py (original)
+++ lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py Mon Oct 11 02:31:59 2010
@@ -227,7 +227,7 @@ content.source=org.apache.lucene.benchma
       print '  mkdir %s' % LOG_DIR
       os.makedirs(LOG_DIR)
 
-    command = '%s -classpath ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-collections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.9.0.jar:lib/xml-apis-2.9.0.jar:../../build/contrib/benchmark/classes/java org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' % (JAVA_COMMAND, algFile, fullLogFileName)
+    command = '%s -classpath ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-collections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.10.0.jar:lib/xml-apis-2.10.0.jar:../../build/contrib/benchmark/classes/java org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' % (JAVA_COMMAND, algFile, fullLogFileName)
 
     if DEBUG:
       print 'command=%s' % command



RE: svn commit: r1021234 - in /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar lib/xml-apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hah, thanks. Wanted to do this, too! :-)

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: sarowe@apache.org [mailto:sarowe@apache.org]
> Sent: Monday, October 11, 2010 4:32 AM
> To: commits@lucene.apache.org
> Subject: svn commit: r1021234 - in
> /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki
> lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar lib/xml-
> apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py
> 
> Author: sarowe
> Date: Mon Oct 11 02:31:59 2010
> New Revision: 1021234
> 
> URL: http://svn.apache.org/viewvc?rev=1021234&view=rev
> Log:
> Upgraded xerces-2.9.1-patched-XERCESJ-1257.jar (committed as part of
> LUCENE-1591) to xercesImpl-2.10.0.jar (which contains the fix for XERCESJ-
> 1257) and also upgraded xml-apis-2.9.0.jar to xml-apis-2.10.0.jar.
> 
> Added:
>     lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar   (with
> props)
>     lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar   (with
> props)
> Removed:
>     lucene/dev/trunk/lucene/contrib/benchmark/lib/xerces-2.9.1-patched-
> XERCESJ-1257.jar
>     lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar
> Modified:
>     lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
>     lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
>     lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
> 
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/CH
> ANGES.txt?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt Mon Oct 11
> +++ 02:31:59 2010
> @@ -2,6 +2,15 @@ Lucene Benchmark Contrib Change Log
> 
>  The Benchmark contrib package contains code for benchmarking Lucene in a
> variety of ways.
> 
> +10/10/2010
> +  The locally built patched version of the Xerces-J jar introduced
> +  as part of LUCENE-1591 is no longer required, because Xerces
> +  2.10.0, which contains a fix for XERCESJ-1257 (see
> +  http://svn.apache.org/viewvc?view=revision&revision=554069),
> +  was released earlier this year.  Upgraded
> +  xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar
> +  to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe)
> +
>  8/2/2010
>    LUCENE-2582: You can now specify the default codec to use for
>    writing new segments by adding default.codec = Pulsing (for
> 
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/RE
> ADME.enwiki?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki Mon Oct 11
> +++ 02:31:59 2010
> @@ -20,50 +20,3 @@ After that, ant enwiki should process th  test. Ant targets
> get-enwiki, expand-enwiki, and extract-enwiki can  also be used to download,
> decompress, and extract (to individual files  in work/enwiki) the dataset,
> respectively.
> -
> -NOTE: This bug in Xerces:
> -
> -  https://issues.apache.org/jira/browse/XERCESJ-1257
> -
> -which is still present as of 2.9.1, causes an exception like this when -processing
> Wikipedia's XML:
> -
> -Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException:
> Invalid byte 2 of 4-byte UTF-8 sequence.
> -	at
> org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknow
> n Source)
> -	at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown
> Source)
> -	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
> Source)
> -	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
> Source)
> -	at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
> Dispatcher.dispatch(Unknown Source)
> -	at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
> known Source)
> -	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
> -	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
> -	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> -	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> -	at
> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(Enwik
> iDocMaker.java:77)
> -	... 1 more
> -
> -The original poster in the Xerces bug provided this patch:
> -
> ---- UTF8Reader.java	2006-11-23 00:36:53.000000000 +0100
> -+++ /home/rainman/lucene/xerces-
> 2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java	2008-04-04
> 00:40:58.000000000 +0200
> -@@ -534,6 +534,16 @@
> -                     invalidByte(4, 4, b2);
> -                 }
> -
> -+                // check if output buffer is large enough to hold 2 surrogate chars
> -+                if( out + 1 >= offset + length ){
> -+                    fBuffer[0] = (byte)b0;
> -+                    fBuffer[1] = (byte)b1;
> -+                    fBuffer[2] = (byte)b2;
> -+                    fBuffer[3] = (byte)b3;
> -+                    fOffset = 4;
> -+                    return out - offset;
> -+		}
> -+
> -                 // decode bytes into surrogate characters
> -                 int uuuuu = ((b0 << 2) & 0x001C) | ((b1 >> 4) & 0x0003);
> -                 if (uuuuu > 0x10) {
> -
> -which I've applied to Xerces 2.9.1 sources, and committed under -lib/xerces-
> 2.9.1-patched-XERCESJ-1257.jar.  Once XERCESJ-1257 is fixed -we can upgrade
> to a standard Xerces release.
> 
> Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/
> xercesImpl-2.10.0.jar?rev=1021234&view=auto
> ================================================================
> ==============
> Binary file - no diff available.
> 
> Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-
> 2.10.0.jar
> ------------------------------------------------------------------------------
>     svn:mime-type = application/octet-stream
> 
> Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/
> xml-apis-2.10.0.jar?rev=1021234&view=auto
> ================================================================
> ==============
> Binary file - no diff available.
> 
> Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-
> 2.10.0.jar
> ------------------------------------------------------------------------------
>     svn:mime-type = application/octet-stream
> 
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/sort
> Bench.py?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py Mon Oct 11
> +++ 02:31:59 2010
> @@ -227,7 +227,7 @@ content.source=org.apache.lucene.benchma
>        print '  mkdir %s' % LOG_DIR
>        os.makedirs(LOG_DIR)
> 
> -    command = '%s -classpath
> ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/cl
> asses/java:lib/commons-digester-1.7.jar:lib/commons-collections-
> 3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-
> 1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.9.0.jar:lib/xml-apis-
> 2.9.0.jar:../../build/contrib/benchmark/classes/java
> org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' %
> (JAVA_COMMAND, algFile, fullLogFileName)
> +    command = '%s -classpath
> + ../../build/classes/java:../../build/classes/demo:../../build/contrib/
> + highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-coll
> + ections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4
> + .jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.10.0.jar:lib/xml-api
> + s-2.10.0.jar:../../build/contrib/benchmark/classes/java
> + org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' %
> + (JAVA_COMMAND, algFile, fullLogFileName)
> 
>      if DEBUG:
>        print 'command=%s' % command
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org