You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-commits@lucene.apache.org by va...@apache.org on 2010/04/19 08:46:29 UTC
svn commit: r935465 -
/lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml
Author: vajda
Date: Mon Apr 19 06:46:28 2010
New Revision: 935465
URL: http://svn.apache.org/viewvc?rev=935465&view=rev
Log:
- added section about embedding a Python VM in a Java VM
Modified:
lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml
Modified: lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml
URL: http://svn.apache.org/viewvc/lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml?rev=935465&r1=935464&r2=935465&view=diff
==============================================================================
--- lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml (original)
+++ lucene/pylucene/site/src/documentation/content/xdocs/jcc/documentation/readme.xml Mon Apr 19 06:46:28 2010
@@ -292,9 +292,11 @@
Java VM to search for classes. Every Python extension produced by
JCC exports a <code>CLASSPATH</code> variable that is hardcoded to
the jar files that it was produced from. A copy of each jar file
- is installed as a resources files along with the extension when
- JCC is invoked with the <code>--install</code> command line
- argument. For example:
+ is installed as a resource file with the extension when JCC is
+ invoked with the <code>--install</code> command line argument.
+ This parameter is optional and defaults to the
+ <code>CLASSPATH</code> string exported by the module
+ <code>initVM</code> is imported from.
<source>
>>> import lucene
>>> lucene.initVM(classpath=lucene.CLASSPATH)
@@ -304,10 +306,10 @@
<code>initialheap</code><br/>
The initial amount of Java heap to start the Java VM with. This
argument is a string that follows the same syntax as the
- similar <code>-Xms</code> java command line argument. For example:
+ similar <code>-Xms</code> java command line argument.
<source>
>>> import lucene
- >>> lucene.initVM(lucene.CLASSPATH, initialheap='32m')
+ >>> lucene.initVM(initialheap='32m')
>>> lucene.Runtime.getRuntime().totalMemory()
33357824L
</source>
@@ -330,8 +332,7 @@
startup rountine. These are passed through as-is. For example:
<source>
>>> import lucene
- >>> lucene.initVM(lucene.CLASSPATH,
- vmargs='-Xcheck:jni,-verbose:jni,-verbose:gc')
+ >>> lucene.initVM(vmargs='-Xcheck:jni,-verbose:jni,-verbose:gc')
</source>
</li>
</ul>
@@ -421,10 +422,10 @@
Java 1.5 added support for parameterized types. JCC generates code
to heed type parameters unless the <code>--no-generics</code>
command line parameter is used. Java type parameterization is a
- runtime feature. There is only one class used for all its
+ runtime feature. The same class is used for all its
parameterizations. Similarly, JCC wrapper objects all use the same
class but store type parameterizations on instances and make them
- accessible as a tuple via the <code>parameters_</code> variable.
+ accessible as a tuple via the <code>parameters_</code> property.
</p>
<p>
For example, an <code>ArrayList<Document></code> instance,
@@ -435,9 +436,9 @@
<p>
To allocate an instance of a generic Java class with specific type
parameters use the <code>of_()</code> method. This method accepts
- one or more Python classes to use as type parameters. For
+ one or more Python wrapper classes to use as type parameters. For
example, <code>java.util.ArrayList<E></code> is declared to
- accept one type parameter. Its wrapper's <code>of_()</code>
+ accept one type parameter. Its wrapper's <code>of_()</code> method
hence accepts one parameter, a Python class, to use as type
parameter for the return type of its <code>get()</code> method, among
others:
@@ -641,7 +642,7 @@
implement first such an extension class in Java:
</p>
<source>
- package org.osafoundation.lucene.analysis;
+ package org.apache.pylucene.analysis;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
@@ -699,23 +700,29 @@
</p>
<source>
class _analyzer(PythonAnalyzer):
- def tokenStream(self, fieldName, reader):
+ def tokenStream(_self, fieldName, reader):
class _tokenStream(PythonTokenStream):
- def __init__(self):
- super(_tokenStream, self).__init__()
- self.TOKENS = ["1", "2", "3", "4", "5"]
- self.INCREMENTS = [1, 2, 1, 0, 1]
- self.i = 0
- def next(self):
- if self.i == len(self.TOKENS):
- return None
- t = Token(self.TOKENS[self.i], self.i, self.i)
- t.setPositionIncrement(self.INCREMENTS[self.i])
- self.i += 1
- return t
- def reset(self):
+ def __init__(self_):
+ super(_tokenStream, self_).__init__()
+ self_.TOKENS = ["1", "2", "3", "4", "5"]
+ self_.INCREMENTS = [1, 2, 1, 0, 1]
+ self_.i = 0
+ self_.posIncrAtt = self_.addAttribute(PositionIncrementAttribute.class_)
+ self_.termAtt = self_.addAttribute(TermAttribute.class_)
+ self_.offsetAtt = self_.addAttribute(OffsetAttribute.class_)
+ def incrementToken(self_):
+ if self_.i == len(self_.TOKENS):
+ return False
+ self_.termAtt.setTermBuffer(self_.TOKENS[self_.i])
+ self_.offsetAtt.setOffset(self_.i, self_.i)
+ self_.posIncrAtt.setPositionIncrement(self_.INCREMENTS[self_.i])
+ self_.i += 1
+ return True
+ def end(self_):
pass
- def close(self):
+ def reset(self_):
+ pass
+ def close(self_):
pass
return _tokenStream()
</source>
@@ -736,6 +743,78 @@
and <a href="site:documentation/readme">samples</a>.
</p>
</section>
+ <section id="embedding">
+ <title>Embedding a Python VM in a Java VM</title>
+ <p>
+ Using the same techniques used when writing a Python extension of a
+ Java class, JCC may also be used to embed a Python VM in a Java VM.
+ Following are the steps and constraints to follow to achieve this:
+ </p>
+ <ul>
+ <li>
+ JCC must be built in shared mode.
+ See <a href="site:jcc/documentation/install">installation
+ instructions</a> for more information about shared mode.
+ </li>
+ <li>
+ As described in the previous section, define one or more Java
+ classes to be "extended" from Python to provide the
+ implementations of the native methods declared on them. Instances
+ of these classes implement the bridges into the Python VM from
+ Java.
+ </li>
+ <li>
+ The <code>org.apache.jcc.PythonVM</code> Java class is going be
+ used from the Java VM's main thread to initialize the embedded
+ Python VM. This class is installed inside the JCC egg under the
+ <code>jcc/classes</code> directory and the full path to this
+ directory must be on the Java <code>CLASSPATH</code>.
+ </li>
+ <li>
+ The JCC egg directory contains the JCC shared runtime library - not
+ the JCC Python extension shared library - but a library
+ called <code>libjcc.dylib</code> on Mac OS X,
+ <code>libjcc.so</code> on Linux or <code>jcc.dll</code> on Windows.
+ This directory must be added to the Java VM's shared library path
+ via the <code>-Djava.library.path</code> command line parameter.
+ </li>
+ <li>
+ In the Java VM's main thread, initialize the Python VM by calling
+ its static <code>start()</code> method passing it a Python program
+ name string and optional start-up arguments in a string array that
+ will be made accessible in Python via <code>sys.argv</code>.
+ This method returns the singleton PythonVM instance to be used in
+ this Java VM. This instance may be retrieved at any later time via
+ the static <code>get()</code> method defined on the
+ <code>org.apache.jcc.PythonVM</code> class.
+ </li>
+ <li>
+ Any Java VM thread that is going to be calling into the Python VM
+ should start with acquiring a reference to the Python thread state
+ object by calling <code>acquireThreadState()</code> method on the
+ Python VM instance. It should then release the Python thread state
+ before terminating by calling <code>releaseThreadState()</code>.
+ Calling these methods is optional but strongly recommended as it
+ ensures that Python is not creating and throwing away a thread
+ state everytime the Python VM is entered and exited from a given
+ Java VM thread.
+ </li>
+ <li>
+ Any Java VM thread may instantiate a Python object for which an
+ extension class was defined in Java as described in the previous
+ section by calling the <code>instantiate()</code> method on the
+ PythonVM instance. This method takes two string parameters, the
+ name of the Python module and the name of the Python class to
+ import and instantiate from it. The <code>__init__()</code>
+ constructor on this class must be callable without any parameters
+ and, if defined, must call <code>super()</code> in order to
+ initialize the Java side. The <code>instantiate()</code> method is
+ declared to return <code>java.lang.Object</code> but the return
+ value is actually an instance of the Java extension class used and
+ must be downcast to it.
+ </li>
+ </ul>
+ </section>
<section id="python">
<title>Pythonic protocols</title>
<p>