You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by va...@apache.org on 2009/01/31 06:47:53 UTC
svn commit: r739511 - in /lucene/pylucene/trunk: CHANGES INSTALL README
jcc/INSTALL jcc/README
Author: vajda
Date: Sat Jan 31 05:47:53 2009
New Revision: 739511
URL: http://svn.apache.org/viewvc?rev=739511&view=rev
Log:
documentation moved to http://lucene.apache.org/pylucene
Modified:
lucene/pylucene/trunk/CHANGES
lucene/pylucene/trunk/INSTALL
lucene/pylucene/trunk/README
lucene/pylucene/trunk/jcc/INSTALL
lucene/pylucene/trunk/jcc/README
Modified: lucene/pylucene/trunk/CHANGES
URL: http://svn.apache.org/viewvc/lucene/pylucene/trunk/CHANGES?rev=739511&r1=739510&r2=739511&view=diff
==============================================================================
--- lucene/pylucene/trunk/CHANGES (original)
+++ lucene/pylucene/trunk/CHANGES Sat Jan 31 05:47:53 2009
@@ -3,7 +3,8 @@
Version 2.4.0 ->
----------------------
- - PyLucene now a subproject of the Apache Lucene project
+ - PyLucene with JCC now a subproject of the Apache Lucene project
+ - documentation moved to http://lucene.apache.org/pylucene
-
Version 2.3.2 -> 2.4.0
Modified: lucene/pylucene/trunk/INSTALL
URL: http://svn.apache.org/viewvc/lucene/pylucene/trunk/INSTALL?rev=739511&r1=739510&r2=739511&view=diff
==============================================================================
--- lucene/pylucene/trunk/INSTALL (original)
+++ lucene/pylucene/trunk/INSTALL Sat Jan 31 05:47:53 2009
@@ -1,69 +1,2 @@
- INSTALL file for PyLucene with JCC build
- ----------------------------------------
-
- Contents
- --------
-
- - Building PyLucene
- - Requirements
- - Notes for Solaris
-
-
- Building PyLucene
- -----------------
-
- PyLucene is now completely code-generated by JCC whose sources are
- included in the jcc sub-directory.
-
- Before building PyLucene, JCC must be built first. See JCC's INSTALL file
- in the jcc subdirectory for building and installing it.
-
- Once JCC is built and installed, PyLucene is built via a Makefile which
- invokes JCC. See PyLucene's Makefile for configuration instructions.
-
- There are limits to both how many files can fit on the command line and
- how large a C++ file the C++ compiler can handle.
- By default, JCC generates one large C++ file containing the source code
- for all wrapper classes.
-
- Using the --files command line argument, this behaviour can be tuned to
- workaround various limits:
- for example:
- - to break up the large wrapper class file into about 2 files:
- --files 2
- - to break up the large wrapper class file into about 10 files:
- --files 10
- - to generate one C++ file per Java class wrapped:
- --files separate
-
-
- Requirements
- ------------
-
- To build PyLucene with JCC a Java Development Kit (JDK) and Ant [1] are
- required; use of the resulting PyLucene requires only a Java Runtime
- Environment (JRE).
-
- The setuptools package [2] is required to build and run PyLucene with JCC
- on Python 2.3.5. With later versions of Python, setuptools is only
- required for shared mode (see JCC's INSTALL for more information).
-
- [1] http://ant.apache.org
- [2] http://pypi.python.org/pypi/setuptools
-
-
- Notes for Solaris
- -----------------
-
- PyLucene's Makefile is a GNU Makefile. Be sure to use 'gmake' instead of
- plain 'make'.
-
- Just as when building JCC, Python's distutils must be nudged a bit to
- invoke the correct compiler. Sun Studio's C compiler is called 'cc' while
- its C++ compiler is called 'CC'.
-
- To build PyLucene, use the following shell command to ensure that the C++
- compiler is used:
-
- $ CC=CC gmake
+Please visit http://lucene.apache.org/pylucene/documentation/install.html
Modified: lucene/pylucene/trunk/README
URL: http://svn.apache.org/viewvc/lucene/pylucene/trunk/README?rev=739511&r1=739510&r2=739511&view=diff
==============================================================================
--- lucene/pylucene/trunk/README (original)
+++ lucene/pylucene/trunk/README Sat Jan 31 05:47:53 2009
@@ -1,251 +1,2 @@
- ********************************************************
- * =============== *
- * IMPORTANT NOTE: *
- * =============== *
- * *
- * Before calling any PyLucene API that requires the *
- * Java VM, start it by calling initVM(classpath, ...) *
- * *
- * More about this function in jcc/README. *
- * *
- ********************************************************
-
- README file for PyLucene with JCC
- ---------------------------------
-
- Contents
- --------
-
- - Installing PyLucene
- - API documentation for PyLucene
-
-
- Installing PyLucene
- -------------------
-
- PyLucene is a Python extension built with JCC.
-
- To build PyLucene, JCC needs to be built first. Sources for JCC are
- available in the jcc sub-directory of this source tree.
- Instructions for building and installing JCC are in jcc/INSTALL.
-
- See INSTALL file for instructions for building PyLucene.
-
-
- API documentation for PyLucene
- ------------------------------
-
- PyLucene is currently built the Java Lucene trunk. It intends to
- supports the entire Lucene API.
-
- PyLucene also includes a number of Lucene contrib packages: the Snowball
- analyzer and stemmers, the highlighter package, analyzers for other
- languages than english, regular expression queries and specialized queries
- such as 'more like this'.
-
- This document only covers the pythonic extensions to Lucene offered
- by PyLucene as well as some differences between the Java and Python
- APIs. For API the documentation on Java Lucene APIs, please visit:
- http://lucene.apache.org/java/docs/api/index.html
-
- To help with debugging and to support some Lucene APIs, PyLucene also
- exposes some Java runtime APIs.
-
- - Contents
-
- . Samples
- . Threading support with attachCurrentThread()
- . Exception handling with lucene.JavaError
- . Differences between the Java Lucene and PyLucene APIs
- . Pythonic extensions to the Java Lucene APIs
- . Extending Lucene classes from Python
-
- - Samples
-
- The best way to learn PyLucene is to look at the many samples included
- with the PyLucene source release or on the web at
-
- http://svn.osafoundation.org/pylucene/trunk/samples/
- http://svn.osafoundation.org/pylucene/trunk/samples/LuceneInAction/
-
- A large number of samples are shipped with PyLucene. Most notably, all
- the samples published in the "Lucene in Action" book that did not
- depend on a third party Java library for which there was no obvious
- Python equivalent were ported to Python and PyLucene.
-
- "Lucene in Action" is a great companion to learning Lucene. Having all
- the samples available in Python should make it even easier for Python
- developers.
-
- "Lucene in Action" was written by Erik Hatcher and Otis Gospodnetic,
- both part of the Java Lucene development team, and is available from
- Manning Publications at http://www.manning.com/hatcher2.
-
- - Threading support with attachCurrentThread()
-
- Before PyLucene APIs can be used from a thread other than the main
- thread that was not created by the Java Runtime, the
- attachCurrentThread() method must be called on the JCCEnv object
- returned by the initVM() or getVMEnv() functions.
-
- - Exception handling with lucene.JavaError
-
- Java exceptions are caught at the language barrier and reported to
- Python by raising a JavaError instance whose args tuple contains the
- actual Java Exception instance.
-
- - Handling Java arrays
-
- Java arrays are returned to Python in a JArray wrapper instance that
- implements the Python sequence protocol. It is possible to change array
- elements but not to change the array size.
-
- A few Lucene APIs take array arguments and expect values to be returned
- in them. To call such an API and be able to retrieve the array values
- after the call, a Java array needs to instantiated first.
-
- For example, accessing termDocs:
-
- termDocs = reader.termDocs(Term("isbn", isbn))
- docs = JArray('int')(1) # allocate an int[1] array
- freq = JArray('int')(1) # allocate an int[1] array
- if termDocs.read(docs, freq) == 1:
- bits.set(docs[0]) # access the array's first element
-
- In addition to 'int', the JArray function accepts 'object', 'string',
- 'bool', 'byte', 'char', 'double', 'float', 'long' and 'short' to create
- an array of the corresponding type. The JArray('object') constructor
- takes a second argument denoting the class of the object
- elements. This argument is optional and defaults to Object.
-
- To convert a char or byte array to a Python string use a ''.join(array)
- construct.
-
- Instead of an integer denoting the size of the desired Java array, a
- sequence of objects of the expected element type may be passed in to the
- array constructor.
-
- For example, creating a Java array of double from the [1.5, 2.5] list:
-
- JArray('double')([1.5, 2.5])
-
- All methods that expect an array also accept a sequence of Python
- objects of the expected element type. If no values are expected from
- the array arguments after the call, it is hence not necessary to
- instantiate a Java array to make such calls.
-
- See jcc/README for more information about handling arrays.
-
- - Differences between the Java Lucene and PyLucene APIs
-
- . The PyLucene API exposes all Java Lucene classes in a flat namespace
- in the PyLucene module.
- For example, the Java import statement:
- import org.apache.lucene.index.IndexReader;
- corresponds to the Python import statement:
- from lucene import IndexReader
-
- . Downcasting is a common operation in Java but not a concept in
- Python. Because the wrapper objects implementing exactly the APIs of
- the declared type of the wrapped object, all classes implement two
- class methods called instance_ and cast_ that verify and cast an
- instance respectively.
-
- - Pythonic extensions to the Java Lucene APIs
-
- Java is a very verbose language. Python, on the other hand, offers
- many syntactically attractive constructs for iteration, property
- access, etc... As the Java Lucene samples from the 'Lucene in Action'
- book were ported to Python, PyLucene received a number of pythonic
- extensions listed here:
-
- . Iterating search hits is a very common operation. Hits instances are
- iterable in Python. Two values are returned for each iteration, the
- zero-based number of the document in the Hits instance and the
- document instance itself.
-
- The Java loop:
-
- for (int i = 0; i < hits.length(); i++) {
- Document doc = hits.doc(i);
- System.out.println(hits.score(i) + " : " + doc.get("title"));
- }
-
- can be written in Python:
-
- for hit in hits:
- hit = Hit.cast_(hit)
- print hit.getScore(), ':', hit.getDocument['title']
-
- if hit.iterator()'s next() method were declared to return Hit
- instead of Object, the above cast_() call would be unnecessary.
-
- The same java loop can also be written:
-
- for i xrange(len(hits)):
- print hits.score(i), ':', hits[i]['title']
-
- . Hits instances partially implement the Python 'sequence' protocol.
-
- The Java expressions:
-
- hits.length()
- doc = hits.get(i)
-
- are better written in Python:
-
- len(hits)
- doc = hits[i]
-
- . Document instances have fields whose values can be accessed through
- the mapping protocol.
-
- The Java expressions:
-
- doc.get("title")
-
- are better written in Python:
-
- doc['title']
-
- . Document instances can be iterated over for their fields
-
- The Java loop:
-
- Enumeration fields = doc.fields();
- while (fields.hasMoreElements()) {
- Field field = (Field) fields.nextElement();
- ...
- }
-
- is better written in Python:
-
- for field in doc.getFields():
- field = Field.cast_(field)
- ...
-
- Once JCC support heeding Java 1.5 annotations and once Java Lucene
- makes use of them, such casting should become unncessary.
-
- - Extending Java Lucene classes from Python
-
- Many areas of the Lucene API expect the programmer to provide their own
- implementation or specialization of a feature where the default is
- inappropriate. For example, text analyzers and tokenizers are an area
- where many parameters and environmental or cultural factors are calling
- for customization.
-
- PyLucene enables this by providing Java extension points listed below
- that serve as proxies for Java to call back into the Python
- implementations of these customizations.
-
- These extension points are simple Java classes that JCC generates the
- native C++ implementations for. It is easy to add more such extensions
- classes into the 'java' directory of the PyLucene source tree.
-
- To learn more about this topic, please refer to the jcc/README file.
-
- Please refer to the classes in the 'java' tree for currently available
- extension points. Examples of uses of these extension points are to be
- found in PyLucene's unit tests and "Lucene in Action" samples.
+Please visit http://lucene.apache.org/pylucene/documentation/readme.html
Modified: lucene/pylucene/trunk/jcc/INSTALL
URL: http://svn.apache.org/viewvc/lucene/pylucene/trunk/jcc/INSTALL?rev=739511&r1=739510&r2=739511&view=diff
==============================================================================
--- lucene/pylucene/trunk/jcc/INSTALL (original)
+++ lucene/pylucene/trunk/jcc/INSTALL Sat Jan 31 05:47:53 2009
@@ -1,155 +1,2 @@
- INSTALL file for JCC build
- --------------------------
-
- Contents
- --------
-
- - Building JCC
- - Requirements
- - Shared Mode (--shared)
- - Notes for Mac OS X
- - Notes for Linux
- - Notes for Solaris
- - Notes for Windows
- - Notes for Python 2.3
-
-
- Building JCC
- ------------
-
- JCC is a Python extension written in Python and C++. It requires a Java
- Runtime Environment to operate as it uses Java's reflection APIs to do
- its work. It is built and installed via distutils or setuptools.
-
- 1. Edit setup.py and review that values in the INCLUDE, CFLAGS,
- DEBUG_CFLAGS, LFLAGS and JAVAC are correct for your system. These
- values are also going to be compiled into JCC's config.py file and are
- going to be used by JCC when invoking distutils or setuptools to compile
- extensions its generating code for.
-
- 2. At the command line, enter:
-
- python setup.py build
- sudo python setup.py install
-
-
- Requirements
- ------------
-
- JCC requires a Java Development Kit to be present. It uses the Java Native
- Invocation Interface and expects <jni.h> and the Java libraries to be
- present at build and runtime.
-
-
- Shared Mode (--shared)
- ----------------------
-
- JCC includes a small runtime that keeps track of the Java VM and of Java
- objects escaping it. Because there can be only one Java VM embedded in a
- given process at a time, the JCC runtime must be compiled as a shared
- library when more than one JCC-built Python extension is going to be
- imported into a given Python process.
-
- Shared mode depends on setuptools' capability of building plain shared
- libraries (as opposed to shared libraries for Python extensions).
- This shared library capability is a feature currently under development.
-
- Currently, shared mode is supported with setuptools 0.6c7 and above out of
- the box on Mac OS X and Windows. On Linux, a patch to setuptools needs to
- be applied first. This patch is included in the JCC source distribution in
- the jcc/patches directory, patch.43. This patch was submitted to the
- setuptools project via issue 43: http://bugs.python.org/setuptools/issue43
-
- The 'shared mode disabled' error reported during the build of JCC's on
- Linux contains the exact instructions on how to patch setuptools with
- patch.43 on your system.
-
- Shared mode is also required when embedding Python in a Java VM as JCC's
- runtime shared library is used by the JVM to load JCC and bootstrap the
- Python VM via JNI.
-
- When shared mode is not enabled, not supported or distutils is used
- instead of setuptools, static mode is used instead. The JCC runtime code
- is statically linked with eacg JCC-built Python extension and only one
- such extension can be used in a given Python process.
-
- As setuptools grows its shared library building capability it is expected
- that more operating systems should be supported with shared mode in the
- future.
-
- Shared mode can be forced off by building JCC with the NO_SHARED
- environment variable set.
-
-
- Notes for Mac OS X
- ------------------
-
- On Mac OS X, Java is installed by Apple's setup as a framework.
- The values for INCLUDE and LFLAGS for 'darwin' should be correct and
- ready to use.
-
-
- Notes for Linux
- ---------------
-
- JCC has been built and tested on a variety of Linux distributions, 32- and
- 64-bit. Getting the java configuration correct is important and is done
- differently for every distribution.
-
- For example:
-
- - on Ubuntu, to install Java 5, these commands may be used:
- sudo apt-get install sun-java5-jdk
- sudo update-java-alternatives -s java-1.5.0-sun
- The samples flags for Linux in JCC's setup.py should be close to
- correct.
-
- - on Gentoo, the java-config utility should be used to locate, and
- possibly change, the default java installation.
- The sample flags for Linux in JCC's setup.py should be changed to
- reflect the root of the Java installation which may be obtained via:
- java-config -O
-
- See above section about 'Shared Mode' for Linux support
-
-
- Notes for Solaris
- -----------------
-
- At this time, JCC has been built and tested only on Solaris 11 with Sun
- Studio C++ 12, Java 1.6 and Python 2.4.
-
- Because JCC is written in C++, Python's distutils must be nudged a bit to
- invoke the correct compiler. Sun Studio's C compiler is called 'cc' while
- its C++ compiler is called 'CC'. To build JCC, use the following shell
- command to ensure that the C++ compiler is used:
-
- CC=CC python setup.py build
-
- Shared mode is not currently implemented for Solaris, setuptools needs to
- be taught how to build plain shared libraries on Solaris first.
-
-
- Notes for Windows
- -----------------
-
- At this time, JCC has been built and tested on Win2k and WinXP with a
- variety of Python and Java versions.
-
- - Adding the Python directory to PATH is recommended.
- - Adding the Java directories containing the necessary DLLs and to PATH is
- a must.
- - Adding the directory containing javac.exe to PATH is required for shared
- mode (enabled by default if setuptools >= 0.6c7 is found to be installed).
-
-
- Notes for Python 2.3
- --------------------
-
- To use JCC with Python 2.3, setuptools is required:
-
- - download setuptools from http://python.org/pypi
- - edit the downloaded setuptools egg file to use python2.3 instead of
- python2.4
- - sudo sh setuptools-0.6c7-py2.4.egg
+Please visit http://lucene.apache.org/pylucene/jcc/documentation/install.html
Modified: lucene/pylucene/trunk/jcc/README
URL: http://svn.apache.org/viewvc/lucene/pylucene/trunk/jcc/README?rev=739511&r1=739510&r2=739511&view=diff
==============================================================================
--- lucene/pylucene/trunk/jcc/README (original)
+++ lucene/pylucene/trunk/jcc/README Sat Jan 31 05:47:53 2009
@@ -1,611 +1,2 @@
- ********************************************************
- * =============== *
- * IMPORTANT NOTE: *
- * =============== *
- * *
- * Before calling any API into the Java VM, start it by *
- * calling initVM(classpath, ...). *
- * *
- * More about this function below. *
- * *
- ********************************************************
-
- README file for JCC
- -------------------
-
- Contents
- --------
-
- - Welcome
- - Installing JCC
- - Generating C++ and Python wrappers with JCC
- - Classpath considerations
- - Using distutils vs setuptools
- - Distributing an egg
- - JCC's runtime API functions
- - Type casting and instance checks
- - Handling arrays
- - Exception reporting
- - Writing Java class extensions in Python
- - Pythonic protocols
-
-
- Welcome
- -------
-
- Welcome to JCC, a code generator for producing Python extensions that
- provide access to Java classes.
-
- For every target Java class, JCC generates a C++ wrapper class that hides
- the gory details necessary for accessing methods and fields on instances
- of the Java class from C++ via Java's Native Invocation Interface.
-
- JCC can also generate C++ wrappers that make it possible to access these
- classes from Python.
-
- When generating Python wrappers, JCC produces a complete Python extension
- via the distutils or setuptools packages that make it readily available to
- the Python interpreter.
-
- JCC is a project maintained by the Open Source Applications Foundation.
-
-
- Installing JCC
- --------------
-
- JCC is a Python extension written in Python and C++. It requires a Java
- Runtime Environment (JRE) to operate as it uses Java's reflection APIs to
- do its work. It is built and installed via distutils or setuptools.
-
- See INSTALL file for more information and operating system specific
- notes.
-
-
- Generating C++ and Python wrappers with JCC
- -------------------------------------------
-
- JCC started as a C++ code generator for hiding the gory details of
- accessing methods and fields on Java classes via Java's Native Invocation
- Interface [1]. These C++ wrappers make it possible to access a Java object
- as if it was a regular C++ object very much like GCJ's CNI interface [2].
-
- It then became apparent that JCC could also generate the C++ wrappers
- for making these classes available to Python. Every class that gets thus
- wrapped becomes a CPython type [3].
-
- JCC generates wrappers for all public classes that are requested by name
- on the command line or via the --jar command line argument. It generates
- wrapper methods for all public methods and fields on these classes whose
- types are found in one of the following ways:
-
- - the type is one of the requested classes
- - the type is one of the requested classes' superclass or implemented
- interfaces
- - the type is available from one of the packages listed via the
- --package command line argument
-
- JCC does not generate wrappers for methods or fields which don't satisfy
- these requirements. Thus, JCC can avoid generating code for runaway
- transitive closures of type dependencies.
-
- JCC generates property accessors for a property called 'field' when it
- finds Java methods named set'Field'(value), get'Field'() or is'Field'().
-
- The C++ wrappers are declared in a C++ namespace structure that mirrors
- the Java classes' Java packages. The Python types are declared in a flat
- namespace at the top level of the resulting Python extension module.
-
- JCC's command-line arguments are best illustrated via the PyLucene
- example:
-
- > python -m jcc # run JCC to wrap
- --jar lucene.jar # all public classes in the lucene jar file
- --jar analyzers.jar # and the lucene analyzers contrib package
- --jar snowball.jar # and the snowball contrib package
- --jar highlighter.jar # and the highlighter contrib package
- --jar regex.jar # and the regex search contrib package
- --jar queries.jar # and the queries contrib package
- --jar extensions.jar # and the Python extensions package
- --package java.lang # including all dependencies found in the
- # java.lang package
- --package java.util # and the java.util package
- --package java.io # and the java.io package
- java.lang.System # and to explicitely wrap java.lang.System
- java.lang.Runtime # as well as java.lang.Runtime
- java.lang.Boolean # and java.lang.Boolean
- java.lang.Byte # and java.lang.Byte
- java.lang.Character # and java.lang.Character
- java.lang.Integer # and java.lang.Integer
- java.lang.Short # and java.lang.Short
- java.lang.Long # and java.lang.Long
- java.lang.Double # and java.lang.Double
- java.lang.Float # and java.lang.Float
- java.text.SimpleDateFormat
- # and java.text.SimpleDateFormat
- java.io.StringReader
- # and java.io.StringReader
- java.io.InputStreamReader
- # and java.io.InputStreamReader
- java.io.FileInputStream
- # and java.io.FileInputStream
- --exclude org.apache.lucene.queryParser.Token
- # while explicitely not wrapping
- # org.apache.lucene.queryParser.Token
- --exclude org.apache.lucene.queryParser.TokenMgrError
- # nor org.apache.lucene.queryParser.TokenMgrError
- --exclude org.apache.lucene.queryParser.ParseException
- # nor.apache.lucene.queryParser.ParseException
- --python lucene # generating Python wrappers into a module
- # called lucene
- --version 2.4.0 # giving the Python extension egg version 2.4.0
- --mapping org.apache.lucene.document.Document
- 'get:(Ljava/lang/String;)Ljava/lang/String;'
- # asking for a Python mapping protocol wrapper
- # for get access on the Document class by
- # calling its get method
- --mapping java.util.Properties
- 'getProperty:(Ljava/lang/String;)Ljava/lang/String;'
- # asking for a Python mapping protocol wrapper
- # for get access on the Properties class by
- # calling its getProperty method
- --sequence org.apache.lucene.search.Hits
- 'length:()I'
- 'doc:(I)Lorg/apache/lucene/document/Document;'
- # asking for a Python sequence protocol wrapper
- # for length and get access on the Hits class by
- # calling its length and doc methods
- --files 2 # generating all C++ classes into about 2 .cpp
- # files
- --build # and finally compiling the generated C++ code
- # into a Python egg via setuptools - when
- # installed - or a regular Python extension via
- # distutils or setuptools otherwise
- --install # installing it into Python's site-packages
- # directory.
-
- There are limits to both how many files can fit on the command line and
- how large a C++ file the C++ compiler can handle.
- By default, JCC generates one large C++ file containing the source code
- for all wrapper classes.
-
- Using the --files command line argument, this behaviour can be tuned to
- workaround various limits:
- for example:
- - to break up the large wrapper class file into about 2 files:
- --files 2
- - to break up the large wrapper class file into about 10 files:
- --files 10
- - to generate one C++ file per Java class wrapped:
- --files separate
-
- The --prefix and --root arguments are passed through to distutils' setup().
-
- [1] http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/invocation.html
- [2] http://gcc.gnu.org/onlinedocs/gcj/About-CNI.html
- [3] http://docs.python.org/ext/defining-new-types.html
-
-
- Classpath considerations
- ------------------------
-
- When generating wrappers for Python, the JAR files passed to JCC via
- --jar are copied into the resulting Python extension as resources and
- added to the extension's CLASSPATH variable.
- Classes or JAR files that are required by the classes contained in the
- argument JAR files need to be made findable via JCC's --classpath command
- line argument. At runtime, these need to be appended to the extension's
- CLASSPATH variable before starting the VM with initVM(CLASSPATH).
-
- To have more jar files automatically copied into resulting python
- extension and added to the classpath at build and runtime, use the
- --include option. This option works like the --jar option except that
- no wrappers are generated for the public classes contained in them unless
- they're explicitely named on the command line.
-
-
- Using distutils vs setuptools
- -----------------------------
-
- By default, when building a Python extension, if setuptools is found to be
- installed, it is used over distutils. If you want to force the use of
- distutils over setuptools, use the --use-distutils command line argument.
-
-
- Distributing an egg
- -------------------
-
- The --bdist option can be used to ask JCC to invoke distutils with 'bdist'
- or setuptools with 'bdist_egg'. If setuptools is used, the resulting egg
- has to be installed with the easy_install installer [2] which is normally
- part of a Python installation that includes setuptools.
-
-
- JCC's runtime API functions
- ---------------------------
-
- JCC includes a small runtime component that is compiled into any Python
- extension it produces.
-
- This runtime component makes it possible to manage the Java VM from
- Python. Because a Java VM can be configured with a myriad of options, it
- is not automatically started when the resulting Python extension module is
- loaded into the Python interpreter.
-
- Instead, the initVM() function must be called from the main thread before
- using any of the wrapped classes. It takes the following keyword
- arguments:
-
- - classpath
- A string containing one or more directories or jar files for the
- Java VM to search for classes. Every Python extension produced by
- JCC exports a CLASSPATH variable that is hardcoded to the jar files
- that it was produced from. A copy of each jar file is installed as a
- resources files along with the extension when JCC is invoked with the
- --install command line argument.
-
- example:
- >>> import lucene
- >>> lucene.initVM(classpath=lucene.CLASSPATH)
-
- - initialheap
- The initial amount of Java heap to start the Java VM with. This
- argument is a string that follows the same syntax as the similar
- -Xms java command line argument.
-
- example:
- >>> import lucene
- >>> lucene.initVM(lucene.CLASSPATH, initialheap='32m')
- >>> lucene.Runtime.getRuntime().totalMemory()
- 33357824L
-
- - maxheap
- The maximum amount of Java heap that could become available to the
- Java VM. This argument is a string that follows the same syntax as
- the similar -Xmx java command line argument.
-
- - maxstack
- The maximum amount of stack space that available to the Java
- VM. This argument is a string that follows the same syntax as
- the similar -Xss java command line argument.
-
- - vmargs
- A string of comma separated additional options to pass to the VM
- startup rountine. These are passed through as-is.
-
- example:
- >>> import lucene
- >>> lucene.initVM(lucene.CLASSPATH,
- vmargs='-Xcheck:jni,-verbose:jni,-verbose:gc')
-
- The initVM() and getVMEnv() functions return a JCCEnv object that has a few
- utility methods on it:
-
- - attachCurrentThread(name, asDaemon)
- Before a thread created in Python or elsewhere but not in the Java VM
- can be used with the Java VM, this method needs to be invoked.
- The two arguments it takes are optional and self-explanatory.
-
- - detachCurrentThread()
- The opposite of attachCurrentThread(). This method should be used with
- extreme caution as Python's and java VM's garbage collectors may
- use a thread detached too early causing a system crash. The utility of
- this method seems dubious at the moment.
-
-
- findClass(className)
-
- There are several differences between JNI's findName() and Java's
- Class.forName():
- - className is a '/' separated string of names
- - the class loaders are different, findClass() may find classes
- that Class.forName() won't.
-
- example:
- >>> from lucene import *
- >>> initVM(CLASSPATH)
- >>> findClass('org/apache/lucene/document/Document')
- <Class: class org.apache.lucene.document.Document>
- >>> Class.forName('org.apache.lucene.document.Document')
- Traceback (most recent call last):
- File "<stdin>", line 1, in <module>
- lucene.JavaError: java.lang.ClassNotFoundException:
- org/apache/lucene/document/Document
- >>> Class.forName('java.lang.Object')
- <Class: class java.lang.Object>
-
-
- Type casting and instance checks
- --------------------------------
-
- Many Java APIs are declared to return types that are less specific than
- the types actually returned. In Java 1.5, this is worked around with
- annotations. JCC does not heed annotations at the moment. A Java API
- declared to return Object will wrap objects as such.
-
- In C++, casting the object into its actual type is supported via the
- regular C casting operator.
-
- In Python each wrapped class has a class method called 'cast_' that
- implements the same functionality.
-
- Similarly, each wrapped class has a class method called 'instance_' that
- tests whether the wrapped java instance is of the given type.
-
- For example:
-
- if BooleanQuery.instance_(query):
- booleanQuery = BooleanQuery.cast_(query)
-
- print booleanQuery.getClauses()
-
-
- Handling arrays
- ---------------
-
- Java arrays are wrapped with a C++ JArray template. The [] operator is
- available for read access. This template, JArray<T>, accomodates all java
- primitive types, jstring, jobject and wrapper class arrays.
-
- Java arrays are returned to Python in a JArray wrapper instance that
- implements the Python sequence protocol. It is possible to change array
- elements but not to change an array's size.
-
- To convert a char or byte array to a Python string use a ''.join(array)
- construct.
-
- Any Java method expecting an array can be called with the corresponding
- sequence object from python.
-
- To instantiate a Java array from Python, use one of the following forms:
-
- >>> array = JArray('int')(size)
- the resulting Java int array is initialized with zeroes
-
- >>> array = JArray('int')(sequence)
- the sequence must only contain ints
- the resulting Java int array contains the ints in the sequence
-
- Instead of 'int', you may also use one of 'object', 'string', 'bool',
- 'byte', 'char', 'double', 'float', 'long' and 'short' to create an array
- of the corresponding type.
-
- Because there is only one wrapper class for object arrays, the
- JArray('object') type's constructor takes a second argument denoting the
- class of the object elements. This argument is optional and defaults to
- Object.
-
- As with the Object types, the JArray types also include a cast_
- method. This method becomes useful when the array returned to Python is
- wrapped as a plain Object. This is the case, for example, with nested
- arrays since there is no distinct Python type for every different java
- object array class - all java object arrays are wrapped by
- JArray('object').
-
- For example:
- - cast obj to an array of ints
- >>> JArray('int').cast_(obj)
- - cast obj to an array of Document
- >>> JArray('object').cast_(obj, Document)
-
- In both cases, the java type of obj must be compatible with the array type
- it is being cast to.
-
- - using nested array:
-
- >>> d = JArray('object')(1, Document)
- >>> d[0] = Document()
- >>> d
- JArray<object>[<Document: Document<>>]
- >>> d[0]
- <Document: Document<>>
- >>> a = JArray('object')(2)
- >>> a[0] = d
- >>> a[1] = JArray('int')([0, 1, 2])
- >>> a
- JArray<object>[<Object: [Lorg.apache.lucene.document.Document;@694f12>, <Object: [I@234265>]
- >>> a[0]
- <Object: [Lorg.apache.lucene.document.Document;@694f12>
- >>> a[1]
- <Object: [I@234265>
- >>> JArray('object').cast_(a[0])[0]
- <Object: Document<>>
- >>> JArray('object').cast_(a[0], Document)[0]
- <Document: Document<>>
- >>> JArray('int').cast_(a[1])
- JArray<int>[0, 1, 2]
- >>> JArray('int').cast_(a[1])[0]
- 0
-
- To verify that a Java object is of a given array type, use the instance_()
- method available on the array type. This is not the same as verifying that
- it is assignable with elements of a given type. For example, using the
- arrays created above:
-
- - is d array of Object ? are d's elements of type Object ?
- >>> JArray('object').instance_(d)
- True
-
- - can it receive Object instances ?
- >>> JArray('object').assignable_(d)
- False
-
- - is it array of Document ? are d's elements of type Document ?
- >>> JArray('object').instance_(d, Document)
- True
-
- - is it array of Class ? are d's elements of type Class ?
- >>> JArray('object').instance_(d, Class)
- False
-
- - can it receive Document instances ?
- >>> JArray('object').assignable_(d, Document)
- True
-
-
- Exception reporting
- -------------------
-
- Exceptions that occur in the Java VM and that escape to C++ are reported
- as a javaError C++ exception. Failure to handle the exception causes the
- process to crash.
-
- Exceptions that occur in the Java VM and that escape to the Python VM are
- reported with a JavaError python exception object. The getJavaException()
- method can be called on JavaError objects to obtain the original java
- exception object wrapped as any other Java object. This Java object can be
- used to obtain a Java stack trace for the error, for example.
-
- Exceptions that occur in the Python VM and that escape to the Java VM, as
- for example can happen in Python extensions (see topic below) are reported
- to the Java VM as a RuntimeException or as a PythonException when using
- shared mode. See INSTALL for more information about shared mode.
-
-
- Writing Java class extensions in Python
- ---------------------------------------
-
- JCC makes it relatively easy to extend a Java class from Python. This is
- done via an intermediary class written in Java, that implements a special
- method called 'pythonExtension()' and that declares a number of native
- methods that are to be implemented by the actual Python extension.
-
- When JCC sees these special extension java classes it generates the C++
- code implementing the native methods they declare. These native methods
- call the corresponding Python method implementations passing in parameters
- and returning the result to the Java VM caller.
-
- For example, to implement a Lucene analyzer in Python, one would implement
- first such an extension class in Java:
-
- package org.osafoundation.lucene.analysis;
-
- import org.apache.lucene.analysis.Analyzer;
- import org.apache.lucene.analysis.TokenStream;
- import java.io.Reader;
-
- public class PythonAnalyzer extends Analyzer {
- private long pythonObject;
-
- public PythonAnalyzer()
- {
- }
-
- public void pythonExtension(long pythonObject)
- {
- this.pythonObject = pythonObject;
- }
- public long pythonExtension()
- {
- return this.pythonObject;
- }
-
- public void finalize()
- throws Throwable
- {
- pythonDecRef();
- }
-
- public native void pythonDecRef();
- public native TokenStream tokenStream(String fieldName, Reader reader);
- }
-
- The pythonExtension() methods is what makes this class recognized as an
- extension class by JCC. They should be included verbatim as above along
- with the declaration of the pythonObject instance variable.
-
- The implementation of the native pythonDecRef() method is generated by JCC
- and is necessary because it seems that finalize() cannot itself be native.
- Since an extension class wraps the Python instance object it's going to be
- calling methods on, its ref count needs to be decremented when this Java
- wrapper class disappears. A declaration for pythonDecRef() and a finalize()
- implementation should always be included verbatim as above.
-
- Really, the only non boilerplate user input is the constructor of the
- class and the other native methods, tokenStream() in the example above.
-
- The corresponding Python class(es) are implemented as follows:
-
- class _analyzer(PythonAnalyzer):
- def tokenStream(self, fieldName, reader):
- class _tokenStream(PythonTokenStream):
- def __init__(self):
- super(_tokenStream, self).__init__()
- self.TOKENS = ["1", "2", "3", "4", "5"]
- self.INCREMENTS = [1, 2, 1, 0, 1]
- self.i = 0
- def next(self):
- if self.i == len(self.TOKENS):
- return None
- t = Token(self.TOKENS[self.i], self.i, self.i)
- t.setPositionIncrement(self.INCREMENTS[self.i])
- self.i += 1
- return t
- def reset(self):
- pass
- def close(self):
- pass
- return _tokenStream()
-
- When an __init__() is declared, super() must be called or else the Java
- wrapper class will not know about the Python instance it needs to invoke.
-
- When a java extension class declares native methods for which there are
- public or protected equivalents available on the parent class, JCC
- generates code that makes it possible to call super() on these methods
- from Python as well.
-
- There are a number of extension examples available in PyLucene's test
- suite and samples.
-
-
- Pythonic protocols
- ------------------
-
- When generating wrappers for Python, JCC attempts to detect which classes
- can be made iterable:
-
- - When a class declares to implement java.util.Iterator or something
- compatible with it, JCC makes it iterable from Python.
-
- - When a Java class declares a method called iterator() with no
- arguments returning a type compatible with java.util.Iterator, this
- class is made iterable from Python.
-
- - When a Java class declares a method called next() with no arguments
- returning an object type, this class is made iterable. Its next()
- method is assumed to terminate iteration by returning null.
-
- JCC generates a Python mapping get method for a class when requested to do
- so via the --mapping command line option which takes two arguments, the
- class to generate the mapping get for and the Java method to use. The
- method is specified with its name followed by ':' and its Java
- signature [1].
-
- for example, System.getProperties()['java.class.path'] is made possible by:
-
- --mapping java.util.Properties
- 'getProperty:(Ljava/lang/String;)Ljava/lang/String;'
- # asking for a Python mapping protocol wrapper
- # for get access on the Properties class by
- # calling its getProperty method
-
- JCC generates Python sequence length and get methods for a class when
- requested to do so via the --sequence command line option which takes
- three arguments, the class to generate the sequence length and get for and
- the two java methods to use. The methods are specified with their name
- followed by ':' and their Java signature [1].
-
- for example:
- for i in xrange(len(hits)):
- doc = hits[i]
- ...
-
- is made possible by:
-
- --sequence org.apache.lucene.search.Hits
- 'length:()I'
- 'doc:(I)Lorg/apache/lucene/document/Document;'
-
- [1] http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/types.html#wp16432
- [2] http://peak.telecommunity.com/DevCenter/EasyInstall
+Please visit http://lucene.apache.org/pylucene/jcc/documentation/readme.html