You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Caleb Burns <ca...@ridersdiscount.com> on 2012/04/17 21:16:22 UTC

PyLucene use JCC shared object by default

Hi,

I've finished the process at my organization of re-implementing SOLR's
faceting algorithm (in C++).

We would like the public at large to have access to the work we've done and
plan to do. In order for this to be a real possibility the code needs to be
built against and use the same JVM as the PyLucene installation does. The
most logical way we feel to have this accomplished is by having PyLucenes'
default installation use JCC as a Shared Object.

We have yet more plans to extend and provide utilities that work with
PyLucene, but this all hinges on having the shared object. The only
alternative methodology would require the bundling of our source with the
PyLucene project itself as a fork.

We are eager to start open sourcing our work, so please let us know what
would be the best way to integrate our work.

-- 
Caleb Burns
Developer | Riders Discount
866.931.6644 x851 | www.RidersDiscount.com <http://www.ridersdiscount.com/>
[image: image.png] <http://www.facebook.com/ridersdiscount> [image:
image.png] <https://twitter.com/#!/ridersdiscount>
Deal of the Day <http://www.twitter.com/#!/rd_dealoftheday>

Re: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
 Hi Caleb,

On Apr 18, 2012, at 10:18, Caleb Burns <ca...@ridersdiscount.com> wrote:

> My question is: would it be possible for JCC to be compiled as a shared
> library in PyLucene (by default) instead of being compiled in as a static
> object?

It cannot be the default as shared mode (the mode where JCC has a shared library component) is not supported on all operating systems where PyLucene/JCC is. It currently works on Mac OS X, Windows and Linux (and on Linux with a patch to setuptools).
Other operating systems need to solve the "building a regular shared library -as opposed to a python extension shared library - via setuptools" first before shared mode can be supported there.

Where the local environment (operating system and presence of setuptools) permits, JCC already defaults to this shared mode. This is why, for example, on Linux, you are presented with the patching of setuptools required to make the building of a regular shared library work, when first building JCC there.

> If JCC was compiled as a shared object and PyLucene linked to it,
> my organization (and possibly others) would be able to maintain and release
> custom extensions for PyLucene written in C++. This would simplify the use
> and need for maintaining a custom installation of PyLucene linked against
> JCC. The reason for our approach is because we primarily use/write Python
> with C and C++ extension.

Out of curiosity, why did you rewrite this Solr module in C++ instead of isolating its Java classes into a jar file and generating C++/Python wrappers for it using JCC ?

Andi..

> 
> On Tue, Apr 17, 2012 at 7:39 PM, Andi Vajda <va...@apache.org> wrote:
> 
>> 
>> Hi Caleb,
>> 
>> 
>> On Tue, 17 Apr 2012, Caleb Burns wrote:
>> 
>> I've finished the process at my organization of re-implementing SOLR's
>>> faceting algorithm (in C++).
>>> 
>>> We would like the public at large to have access to the work we've done
>>> and
>>> plan to do. In order for this to be a real possibility the code needs to
>>> be
>>> built against and use the same JVM as the PyLucene installation does. The
>>> most logical way we feel to have this accomplished is by having PyLucenes'
>>> default installation use JCC as a Shared Object.
>>> 
>>> We have yet more plans to extend and provide utilities that work with
>>> PyLucene, but this all hinges on having the shared object. The only
>>> alternative methodology would require the bundling of our source with the
>>> PyLucene project itself as a fork.
>>> 
>>> We are eager to start open sourcing our work, so please let us know what
>>> would be the best way to integrate our work.
>>> 
>> 
>> Ok, so what is your question ?
>> PyLucene's shared mode also depends on JCC's shared library.
>> Is your question about what the default should be ?
>> 
>> Andi..
>> 
> 
> 
> 
> -- 
> Caleb Burns
> Developer | Riders Discount

Re: PyLucene use JCC shared object by default

Posted by Caleb Burns <ca...@ridersdiscount.com>.
Hi Andi,

My question is: would it be possible for JCC to be compiled as a shared
library in PyLucene (by default) instead of being compiled in as a static
object? If JCC was compiled as a shared object and PyLucene linked to it,
my organization (and possibly others) would be able to maintain and release
custom extensions for PyLucene written in C++. This would simplify the use
and need for maintaining a custom installation of PyLucene linked against
JCC. The reason for our approach is because we primarily use/write Python
with C and C++ extension.

On Tue, Apr 17, 2012 at 7:39 PM, Andi Vajda <va...@apache.org> wrote:

>
>  Hi Caleb,
>
>
> On Tue, 17 Apr 2012, Caleb Burns wrote:
>
>  I've finished the process at my organization of re-implementing SOLR's
>> faceting algorithm (in C++).
>>
>> We would like the public at large to have access to the work we've done
>> and
>> plan to do. In order for this to be a real possibility the code needs to
>> be
>> built against and use the same JVM as the PyLucene installation does. The
>> most logical way we feel to have this accomplished is by having PyLucenes'
>> default installation use JCC as a Shared Object.
>>
>> We have yet more plans to extend and provide utilities that work with
>> PyLucene, but this all hinges on having the shared object. The only
>> alternative methodology would require the bundling of our source with the
>> PyLucene project itself as a fork.
>>
>> We are eager to start open sourcing our work, so please let us know what
>> would be the best way to integrate our work.
>>
>
> Ok, so what is your question ?
> PyLucene's shared mode also depends on JCC's shared library.
> Is your question about what the default should be ?
>
> Andi..
>



-- 
Caleb Burns
Developer | Riders Discount

Re: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
  Hi Caleb,

On Tue, 17 Apr 2012, Caleb Burns wrote:

> I've finished the process at my organization of re-implementing SOLR's
> faceting algorithm (in C++).
>
> We would like the public at large to have access to the work we've done and
> plan to do. In order for this to be a real possibility the code needs to be
> built against and use the same JVM as the PyLucene installation does. The
> most logical way we feel to have this accomplished is by having PyLucenes'
> default installation use JCC as a Shared Object.
>
> We have yet more plans to extend and provide utilities that work with
> PyLucene, but this all hinges on having the shared object. The only
> alternative methodology would require the bundling of our source with the
> PyLucene project itself as a fork.
>
> We are eager to start open sourcing our work, so please let us know what
> would be the best way to integrate our work.

Ok, so what is your question ?
PyLucene's shared mode also depends on JCC's shared library.
Is your question about what the default should be ?

Andi..

>
> -- 
> Caleb Burns
> Developer | Riders Discount
> 866.931.6644 x851 | www.RidersDiscount.com <http://www.ridersdiscount.com/>
> [image: image.png] <http://www.facebook.com/ridersdiscount> [image:
> image.png] <https://twitter.com/#!/ridersdiscount>
> Deal of the Day <http://www.twitter.com/#!/rd_dealoftheday>
>

AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Dear Caleb,

Thanks for the sample - as usual code says more than thousand words .-)

The API really looks MUCH simpler than the current lucene facet API (as far as I can tell from my first steps is quite complex).

> With initial tests, the algorithm is about 100 faster in C++ than when implemented in Python.

Wow that’s a nice factor you gain! Did you also compare it to the "standard" lucene facet approach?
 
The main difference I can observe so far is that lucene facet search allows to

a) define a kind of "category hierarchy" and search/count within this tree, e.g. regarding your example have
  'Helmets/Type/Full Face'
  'Helmets/Type/Open Face'
  ...
  'Helmets/User/Adult'
  'Helmets/User/Youth'
  etc.

This is done via the CategoryPath mainly  -see example code I just posted on the list (though I assume you're familiar with that approach).
  
b) 'drilldown' - i.e. re-run a search with the same query but restrict it to some facet/category of interest

Or is this also provided by your API?


best regards

Thomas 
--
OrbiTeam Software GmbH & Co. KG, Germany
http://www.orbiteam.de



> -----Ursprüngliche Nachricht-----
> Von: Caleb Burns [mailto:caleb@ridersdiscount.com]
> Gesendet: Mittwoch, 18. April 2012 22:17
> An: pylucene-dev@lucene.apache.org
> Betreff: Re: PyLucene use JCC shared object by default
> 
> Hi Thomas,
> 
> Our primary motivation was performance and secondary was a "pythonic"
> api.
> Our needs were simpler than the complexity of the whole lucene.facet
> package. On the Lucene side of things, it looks like we have something similar
> to CategoryPath (statically 2 deep: "/Field/Value") and FacetRequest (only
> allow searching at root level, optionally only on filtered docs set and fields).
> Specifically, we implemented an index/cache of all documents and their
> terms. As far as I know SOLR uses caching of the Lucene index to perform
> faceting.
> 
> Our implementation is based on
> http://lucene.apache.org/solr/api/org/apache/solr/request/UnInvertedFiel
> d.html
> and
> the interface in Python is almost identical. You pass our object an
> IndexReader and by default all Terms with TermVectors are indexed. You can
> then selectively retrieve fields. Here's an example of use:
> http://pastebin.com/Lq3LZKMp. The whole module is ~2000 lines (python
> interface, c++ implementation, comments). With initial tests, the algorithm is
> about 100 faster in C++ than when implemented in Python.
> 
> On Wed, Apr 18, 2012 at 9:31 AM, Thomas Koch <ko...@orbiteam.de> wrote:
> 
> > Hi,
> > sounds like an interesting project – may I ask what you actually
> > implemented and what’s the motivation (e.g. performance?)?
> >
> > I’ve started to experiment with the Facet support in Lucene (actually
> > in PyLucene – ported an example to Python) and found that facetted
> > search support in Lucene looks powerful (though API is still said to
> > be ‘experimental’ and I can’t say anything about performance yet).
> > I’m talking about the org.apache.lucene.facet.* packages – part of the
> > contrib part of Lucene and available as JARs that’s accessible in PyLucene as
> well.
> > I’m not that familiar with Solr but AFAIK it’s based on Lucene (Java)
> > and should (hopefully) use the same Java code for its facet search
> > support. Of course Solr adds some nice configuration support and web
> > GUI to Lucene, but the ‘core’ search is built on Lucene (to my
> > knowledge). So did you re-implement the Lucene facet search/index code
> > (like TaxonomyReader/Writer, FacetRequest stuff etc.) in C++ or what
> > part of Solr??
> >
> > Regarding Facet support in PyLucene I can share the samples I’ve ‘ported’
> > to Python so far. There’s still a patch pending for JavaList (required
> > by facet features) which I come back to later on this list (still some
> > open issues). Hopefully this can be included in the PyLucene 3.6
> > version …
> >
> > Regards
> > Thomas
> > --
> > OrbiTeam Software GmbH & Co. KG
> > Germany  http://www.orbiteam.de
> >
> >
> > Von: Caleb Burns [mailto:caleb@ridersdiscount.com]
> > Gesendet: Dienstag, 17. April 2012 21:16
> > An: pylucene-dev@lucene.apache.org
> > Betreff: PyLucene use JCC shared object by default
> >
> > Hi,
> >
> > I've finished the process at my organization of re-implementing SOLR's
> > faceting algorithm (in C++).
> >
> > We would like the public at large to have access to the work we've
> > done and plan to do. In order for this to be a real possibility the
> > code needs to be built against and use the same JVM as the PyLucene
> installation does.
> > The most logical way we feel to have this accomplished is by having
> > PyLucenes' default installation use JCC as a Shared Object.
> >
> > We have yet more plans to extend and provide utilities that work with
> > PyLucene, but this all hinges on having the shared object. The only
> > alternative methodology would require the bundling of our source with
> > the PyLucene project itself as a fork.
> >
> > We are eager to start open sourcing our work, so please let us know
> > what would be the best way to integrate our work.
> >
> 
> 
> 
> --
> Caleb Burns
> Developer | Riders Discount



Re: PyLucene use JCC shared object by default

Posted by Caleb Burns <ca...@ridersdiscount.com>.
Hi Thomas,

Our primary motivation was performance and secondary was a "pythonic" api.
Our needs were simpler than the complexity of the whole lucene.facet
package. On the Lucene side of things, it looks like we have something
similar to CategoryPath (statically 2 deep: "/Field/Value") and
FacetRequest (only allow searching at root level, optionally only on
filtered docs set and fields). Specifically, we implemented an index/cache
of all documents and their terms. As far as I know SOLR uses caching of the
Lucene index to perform faceting.

Our implementation is based on
http://lucene.apache.org/solr/api/org/apache/solr/request/UnInvertedField.html
and
the interface in Python is almost identical. You pass our object an
IndexReader and by default all Terms with TermVectors are indexed. You can
then selectively retrieve fields. Here's an example of use:
http://pastebin.com/Lq3LZKMp. The whole module is ~2000 lines (python
interface, c++ implementation, comments). With initial tests, the algorithm
is about 100 faster in C++ than when implemented in Python.

On Wed, Apr 18, 2012 at 9:31 AM, Thomas Koch <ko...@orbiteam.de> wrote:

> Hi,
> sounds like an interesting project – may I ask what you actually
> implemented and what’s the motivation (e.g. performance?)?
>
> I’ve started to experiment with the Facet support in Lucene (actually in
> PyLucene – ported an example to Python) and found that facetted search
> support in Lucene looks powerful (though API is still said to be
> ‘experimental’ and I can’t say anything about performance yet).  I’m
> talking about the org.apache.lucene.facet.* packages – part of the contrib
> part of Lucene and available as JARs that’s accessible in PyLucene as well.
> I’m not that familiar with Solr but AFAIK it’s based on Lucene (Java) and
> should (hopefully) use the same Java code for its facet search support. Of
> course Solr adds some nice configuration support and web GUI to Lucene, but
> the ‘core’ search is built on Lucene (to my knowledge). So did you
> re-implement the Lucene facet search/index code (like
> TaxonomyReader/Writer, FacetRequest stuff etc.) in C++ or what part of
> Solr??
>
> Regarding Facet support in PyLucene I can share the samples I’ve ‘ported’
> to Python so far. There’s still a patch pending for JavaList (required by
> facet features) which I come back to later on this list (still some open
> issues). Hopefully this can be included in the PyLucene 3.6 version …
>
> Regards
> Thomas
> --
> OrbiTeam Software GmbH & Co. KG
> Germany  http://www.orbiteam.de
>
>
> Von: Caleb Burns [mailto:caleb@ridersdiscount.com]
> Gesendet: Dienstag, 17. April 2012 21:16
> An: pylucene-dev@lucene.apache.org
> Betreff: PyLucene use JCC shared object by default
>
> Hi,
>
> I've finished the process at my organization of re-implementing SOLR's
> faceting algorithm (in C++).
>
> We would like the public at large to have access to the work we've done
> and plan to do. In order for this to be a real possibility the code needs
> to be built against and use the same JVM as the PyLucene installation does.
> The most logical way we feel to have this accomplished is by having
> PyLucenes' default installation use JCC as a Shared Object.
>
> We have yet more plans to extend and provide utilities that work with
> PyLucene, but this all hinges on having the shared object. The only
> alternative methodology would require the bundling of our source with the
> PyLucene project itself as a fork.
>
> We are eager to start open sourcing our work, so please let us know what
> would be the best way to integrate our work.
>



-- 
Caleb Burns
Developer | Riders Discount

Re: AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Sorry about that - you can find it here
http://db.tt/nhiCrgGU

Regards
Thomas
--
Am 30.04.2012 um 19:31 schrieb Andi Vajda <va...@apache.org>:

> 
> On Mon, 30 Apr 2012, Thomas Koch wrote:
> 
>> Dear Andi, I again had a look at the patch I submitted recently and would like to get back to it.  An updated version of the patch is attached to this email - the patch is against the branch_3x repo http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x
> 
> Oh, and there is no attachment in your email. Maybe it got eaten up by some mail server. Please, make sure it's of a text mimetype or mail it to me directly.
> 
> Thanks !
> 
> Andi..

Re: AW: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
On Fri, 4 May 2012, Thomas Koch wrote:

> Thanks, Andi - test runs fine now.
>
> I've another small contribution to offer: samples/java/FacetExample.py -
> It's a python port of the facet example in java in package
> org.apache.lucene.facet.example.simple (actually it's a bit simplified as
> the four java files are merged into one python file: SimpleIndexer.java,
> SimpleMain.java,  SimpleSearcher.java,  SimpleUtils.java).
>
> A patch to branch3x is attached - you may want to include it (up to you of
> course).
>
> Regards and have a nice weekend,

Oh cool. Just before I was going to start the release process.
Thanks !

Andi..

>
> Thomas
>
> -----Ursprüngliche Nachricht-----
> Von: Andi Vajda [mailto:vajda@apache.org]
> Gesendet: Donnerstag, 3. Mai 2012 20:14
> An: pylucene-dev@lucene.apache.org
> Betreff: Re: AW: AW: AW: AW: PyLucene use JCC shared object by default
> ...
>
> I fixed your patch to use cast() so as to unbox the boxed primitive types
> (and strings) to resolve the failures.
>
> Thank you for the patch, it's now checked in.
>
> Andi..
>

AW: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Thanks, Andi - test runs fine now.

I've another small contribution to offer: samples/java/FacetExample.py -
It's a python port of the facet example in java in package
org.apache.lucene.facet.example.simple (actually it's a bit simplified as
the four java files are merged into one python file: SimpleIndexer.java,
SimpleMain.java,  SimpleSearcher.java,  SimpleUtils.java).

A patch to branch3x is attached - you may want to include it (up to you of
course).

Regards and have a nice weekend,

Thomas

-----Ursprüngliche Nachricht-----
Von: Andi Vajda [mailto:vajda@apache.org] 
Gesendet: Donnerstag, 3. Mai 2012 20:14
An: pylucene-dev@lucene.apache.org
Betreff: Re: AW: AW: AW: AW: PyLucene use JCC shared object by default
...

I fixed your patch to use cast() so as to unbox the boxed primitive types
(and strings) to resolve the failures.

Thank you for the patch, it's now checked in.

Andi..

Re: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
On Thu, 3 May 2012, Thomas Koch wrote:

> thanks for the fixes and cleanup! I've updated from SVN and wrote the unit 
> test. Runs withtout errors (Tracebacks etc.) now, however there may be 
> some slight type mismatch issue still: the comparison of objects retrieved 
> from JArray and ArrayList respectively with the objects from the initial 
> JavaList does not match in case of the ArrayList (test yields 4 failures - 
> see below). That's with my newly built JCC2.12/PyLucene3.5 from branch_3x.
>
> As far as I understand
> elem0 = jArray[0] -> yields python object
>
> elem0 = arrayList.get(0) -> yields wrapped Java object
>
> Not sure if that's intended. In that case the test should be fixed ,-)

I fixed your patch to use cast() so as to unbox the boxed primitive types 
(and strings) to resolve the failures.

Thank you for the patch, it's now checked in.

Andi..

AW: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
> > As far as I understand
> > elem0 = jArray[0] -> yields python object
> >
> > elem0 = arrayList.get(0) -> yields wrapped Java object
> >
> > Not sure if that's intended. In that case the test should be fixed ,-)
> 
> If the array is an array of object, then objects you get, including
instances of
> java.lang.Integer. If the array is array of int, for example, then ints
you get.
> 
Ah I see - it's that the toArray() method runs in "Python land" still while
ArrayList is in JVM already, right!? (In fact the ArrayList constructor
itself calls toArray() too but then inside JVM and JCC probably does the
conversion to Java Objects while the Python2Java frontier is passed I
guess...)

This mix of Python and Java is sometimes confusing, but that's the price you
have to pay ,-)

Regards,
Thomas 




Re: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
On Thu, 3 May 2012, Thomas Koch wrote:

> thanks for the fixes and cleanup! I've updated from SVN and wrote the unit 
> test. Runs withtout errors (Tracebacks etc.) now, however there may be 
> some slight type mismatch issue still: the comparison of objects retrieved 
> from JArray and ArrayList respectively with the objects from the initial 
> JavaList does not match in case of the ArrayList (test yields 4 failures - 
> see below). That's with my newly built JCC2.12/PyLucene3.5 from branch_3x.
>
> As far as I understand
> elem0 = jArray[0] -> yields python object
>
> elem0 = arrayList.get(0) -> yields wrapped Java object
>
> Not sure if that's intended. In that case the test should be fixed ,-)

If the array is an array of object, then objects you get, including 
instances of java.lang.Integer. If the array is array of int, for example, 
then ints you get.

> Attached is a patch with the test - this time with .txt extension - hope 
> it get's through...

Yes, the patch got through, thank you.

Andi..

>
> regards
> Thomas
>
> ======================================================================
> FAIL: test_ArrayList (__main__.Test_CollectionsBoolList)
> create ArrayList in JVM (from the JavaSet)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_Collections.py", line 208, in test_ArrayList
>    elem0,type(elem0), listElem0, type(listElem0)))
> AssertionError: should be equal: true (<type 'Object'>) <-> True (<type 'bool'>
>
>
> ======================================================================
> FAIL: test_ArrayList (__main__.Test_CollectionsFloatList)
> create ArrayList in JVM (from the JavaSet)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_Collections.py", line 208, in test_ArrayList
>    elem0,type(elem0), listElem0, type(listElem0)))
> AssertionError: should be equal: 1.5 (<type 'Object'>) <-> 1.5 (<type 'float'>)
>
> ======================================================================
> FAIL: test_ArrayList (__main__.Test_CollectionsListBase)
> create ArrayList in JVM (from the JavaSet)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_Collections.py", line 208, in test_ArrayList
>    elem0,type(elem0), listElem0, type(listElem0)))
> AssertionError: should be equal: 0 (<type 'Object'>) <-> 0 (<type 'int'>)
>
> ======================================================================
> FAIL: test_ArrayList (__main__.Test_CollectionsStringList)
> create ArrayList in JVM (from the JavaSet)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_Collections.py", line 208, in test_ArrayList
>    elem0,type(elem0), listElem0, type(listElem0)))
> AssertionError: should be equal: a (<type 'Object'>) <-> a (<type 'str'>)
>
> ----------------------------------------------------------------------
> Ran 42 tests in 0.031s
>
> FAILED (failures=4)
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andi Vajda [mailto:vajda@apache.org]
>> Gesendet: Donnerstag, 3. Mai 2012 01:00
>> An: pylucene-dev@lucene.apache.org
>> Betreff: Re: AW: AW: AW: PyLucene use JCC shared object by default
>>
>>
>>   Hi Thomas,
>>
>> On Wed, 2 May 2012, Thomas Koch wrote:
>>
>>> could you download the patch from the link?
>>
>> Yes, I got your patch just fine.
>>
>> I fixed a few bugs today having to do with converting sequences to JArray
>> and added support for auto-boxing primitive types when converting a
>> sequence to an object JArray. Now your collections-demo.py all works fine !
>>
>> With these fixes the Python toArray() methods can return a Python
>> sequence object directly, there is no need to do the JArray conversion in
>> Python anymore.
>>
>> I simplified the collections.py file a bit to reflect the fixes and all changes,
>> including the PythonList/PythonListIterator code is now checked in.
>>
>> Could you please convert collections-demo.py into a proper unit test module
>> like the unit tests in pylucene/test so that it gets integrated into the test
>> suite ?
>>
>> Thanks !
>>
>> Andi..
>>
>>>
>>> Just one more thing ... in the initial implementation of PythonList I did the
>> toArray() method in Python and the toArray(Object[]) method in Java - just
>> as was done for the PythonSet:
>>>
>>> +    public native List subList(int fromIndex, int toIndex);
>>> +    public native Object[] toArray();
>>> +
>>> +    public Object[] toArray(Object[] a)
>>> +    {
>>> +        Object[] array = toArray();
>>> +
>>> +        if (a.length < array.length)
>>> +            a = (Object[])
>> Array.newInstance(a.getClass().getComponentType(),
>>> +                                             array.length);
>>> +
>>> +        System.arraycopy(array, 0, a, 0, array.length);
>>> +
>>> +        return a;
>>> +    }
>>>
>>> (from patch of Feb 22nd I sent to you)
>>>
>>>> From the current patch you can see that the latter part is missing now -
>> the toArray(Object[]) method is now done in Python as well, i.e. it simply
>> calls toArray():
>>>
>>>
>> ==========================================================
>> =========
>>> --- java/org/apache/pylucene/util/PythonSet.java	(revision 1332162)
>>> +++ java/org/apache/pylucene/util/PythonSet.java	(working copy)
>>> @@ -62,14 +62,6 @@
>>>
>>>     public Object[] toArray(Object[] a)
>>>     {
>>> -        Object[] array = toArray();
>>> -
>>> -        if (a.length < array.length)
>>> -            a = (Object[]) Array.newInstance(a.getClass().getComponentType(),
>>> -                                             array.length);
>>> -
>>> -        System.arraycopy(array, 0, a, 0, array.length);
>>> -
>>> -        return a;
>>> +        return toArray();
>>>     }
>>> }
>>>
>>> As far as I remember that was part of your changes in between (I probably
>> never touched PythonSet). Anyway I could imagine that this is related to the
>> current problem.
>>>
>>> However, the ArrayList never calls the 2nd toArray method. The ArrayList
>> constructor actually triggers the "simple" toArray method:
>>>
>>>    /**
>>>     * Constructs a list containing the elements of the specified
>>>     * collection, in the order they are returned by the collection's
>>>     * iterator.
>>>     *
>>>     * @param c the collection whose elements are to be placed into this list
>>>     * @throws NullPointerException if the specified collection is null
>>>     */
>>>    public ArrayList(Collection<? extends E> c) {
>>>        elementData = c.toArray();
>>>        size = elementData.length;
>>>        // c.toArray might (incorrectly) not return Object[] (see 6260652)
>>>        if (elementData.getClass() != Object[].class)
>>>            elementData = Arrays.copyOf(elementData, size, Object[].class);
>>>    }
>>>
>>> The ArrayList source code is attached (from OpenJDK6 sources).
>>>
>>> So maybe that's the wrong path ...  Anyhow I feel the mapping of
>> toArray(Object[]) to toArray() does not fully comply with the Java API
>> description:
>>> http://docs.oracle.com/javase/6/docs/api/java/util/List.html#toArray(T
>>> [])
>>>
>>>
>>> Hope that helps...
>>>
>>>
>>> Regards,
>>> Thomas
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Andi Vajda [mailto:vajda@apache.org]
>>>> Gesendet: Montag, 30. April 2012 19:32
>>>> An: pylucene-dev@lucene.apache.org
>>>> Betreff: Re: AW: AW: PyLucene use JCC shared object by default
>>>>
>>>>
>>>> On Mon, 30 Apr 2012, Thomas Koch wrote:
>>>>
>>>>> Dear Andi, I again had a look at the patch I submitted recently and
>>>>> would like to get back to it.  An updated version of the patch is
>>>>> attached to this email - the patch is against the branch_3x repo
>>>>> http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x
>>>>
>>>> Oh, and there is no attachment in your email. Maybe it got eaten up
>>>> by some mail server. Please, make sure it's of a text mimetype or
>>>> mail it to me directly.
>>>>
>>>> Thanks !
>>>>
>>>> Andi..
>>>>
>>>>>
>>>>> The patch mainly
>>>>> - adds two java classes:  PythonList,  PythonListIterator
>>>>> - adds according Python classes   (JavaListIterator and JavaList in
>>>> collections.py)
>>>>>
>>>>> Purpose:
>>>>> - provide a Java-based List implementation in JCC/PyLucene (similar
>>>>> to existing PythonSet/JavaSet)
>>>>> - allow to pass python lists via Java Collections into PyLucene
>>>>>
>>>>> Let's try summarize shortly: PythonSet /JavaSet was already
>>>>> existing, but
>>>> nothing similar for Lists. I made an implementation of PythonList
>>>> /JavaList and with your help this is now basically working. Except of
>>>> an open issue that affects both JavaSet and JavaList: initialization
>>>> of an ArrayList with a JavaSet (or JavaList) may cause trouble.
>>>>>
>>>>> As you said: "There is a bug somewhere with constructing an
>>>>> ArrayList from
>>>> a python collection like JavaSet or JavaList."
>>>>>
>>>>> I tried to change the toArray() method as you suggested, but that
>>>>> didn't
>>>> help. As far as I understood, there are two options to box python
>>>> values into a typed JArray:
>>>>>
>>>>> 1)  use the object based JArray class and box python values by
>>>>> wrapping
>>>> them with the corresponding Java object (e.g. type<int> ->
>> lucene.Integer):
>>>>>
>>>>>>>> x =
>>>>>>>> lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(Fals
>>>>>>>> e)
>>>>>>>> ])
>>>>> JArray<object>[<Object: true>, <Object: false>]
>>>>>>>> type(x[0])
>>>>> <type 'Object'>
>>>>>
>>>>> 2)  use the correct array type (int, float, etc.) and pass the list
>>>>> of Python
>>>> elements or literals) to the JArray constructur, e.g.
>>>>>
>>>>>>>> y = lucene.JArray('bool')([True,False])
>>>>> JArray<bool>[True, False]
>>>>>>>> type(y[0])
>>>>> <type 'bool'>
>>>>>
>>>>> I tried both of them (see _pyList2JArray methods in collections.py)
>>>>> but
>>>> none of them did the trick. Actually the 'empty objects in ArrayList'
>>>> problem remains when handling with strings (the ArrayList object that
>>>> is initialized with a JavaSet or JavaList of string items will have a
>>>> number of objects as the original JavaSet/JavaList, but all objects
>>>> are the same - ooks like an array of empty objects). Furthermore another
>> issue with integer lists comes into play:
>>>> here the initialization of  ArrayList with the Collection fails with
>>>> a Java stacktrace (lucene.JavaError: org.apache.jcc.PythonException).
>>>>>
>>>>> The most simple test case is as follows:
>>>>>
>>>>> --%< --
>>>>> import lucene
>>>>> lucene.initVM()
>>>>> from lucene.collections import JavaList
>>>>>
>>>>> # using strings: the ArrayList is created, but initialized with
>>>>> empty objects jl = JavaList(['a','b']) al = lucene.ArrayList(jl)
>>>>> assert (not al.get(0).equals(al.get(1))), "unique values"
>>>>>
>>>>> # using ints: the ArrayList is not created,  but an error occurs instead:
>>>>> # Java stacktrace: org.apache.jcc.PythonException: ('while calling
>>>>> toArray') jl = JavaList(range(3)) al = lucene.ArrayList(jl) --%< --
>>>>>
>>>>> I currently feel like having to stab around in the dark to find out
>>>>> what's going on here and would welcome any suggestions. Needs some
>>>> JCC
>>>>> expert I guess ,-)
>>>>>
>>>>> Of course we can leave the patch out - but still there's the same
>>>>> issue with
>>>> JavaSet.
>>>>>
>>>>>
>>>>> kind regards
>>>>>
>>>>> Thomas
>>>>> --
>>>>> OrbiTeam Software GmbH & Co. KG, Germany http://www.orbiteam.de
>>>>>
>>>>>
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: Andi Vajda [mailto:vajda@apache.org]
>>>>>> Gesendet: Mittwoch, 18. April 2012 20:37
>>>>>> An: pylucene-dev@lucene.apache.org
>>>>>> Betreff: Re: AW: PyLucene use JCC shared object by default
>>>>>>
>>>>>>
>>>>>> Hi Thomas,
>>>>>> ...
>>>>>> Lucene 3.6 just got released a few days ago. Apart from your patch,
>>>>>> the PyLucene 3.6 release is ready. I'm about to go offline (email
>>>>>> only) for a
>>>> week.
>>>>>> Let's revisit this patch then (first week of May). It's not
>>>>>> blocking the
>>>> release
>>>>>> right now as, even if I sent out a release candidate for a vote,
>>>>>> the three business days required for this would take this into the time
>> I'm away.
>>>>>> ...
>>>>>> Andi..
>>>>>
>>>>>
>>>
>

AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Andi,
thanks for the fixes and cleanup! I've updated from SVN and wrote the unit test. Runs withtout errors (Tracebacks etc.) now, however there may be some slight type mismatch issue still: the comparison of objects retrieved from JArray and ArrayList respectively with the objects from the initial JavaList does not match in case of the ArrayList (test yields 4 failures - see below). That's with my newly built JCC2.12/PyLucene3.5 from branch_3x. 

As far as I understand 
 elem0 = jArray[0] -> yields python object
 
 elem0 = arrayList.get(0) -> yields wrapped Java object
 
Not sure if that's intended. In that case the test should be fixed ,-)

Attached is a patch with the test - this time with .txt extension - hope it get's through...

regards
Thomas

======================================================================
FAIL: test_ArrayList (__main__.Test_CollectionsBoolList)
create ArrayList in JVM (from the JavaSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Collections.py", line 208, in test_ArrayList
    elem0,type(elem0), listElem0, type(listElem0)))
AssertionError: should be equal: true (<type 'Object'>) <-> True (<type 'bool'>


======================================================================
FAIL: test_ArrayList (__main__.Test_CollectionsFloatList)
create ArrayList in JVM (from the JavaSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Collections.py", line 208, in test_ArrayList
    elem0,type(elem0), listElem0, type(listElem0)))
AssertionError: should be equal: 1.5 (<type 'Object'>) <-> 1.5 (<type 'float'>)

======================================================================
FAIL: test_ArrayList (__main__.Test_CollectionsListBase)
create ArrayList in JVM (from the JavaSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Collections.py", line 208, in test_ArrayList
    elem0,type(elem0), listElem0, type(listElem0)))
AssertionError: should be equal: 0 (<type 'Object'>) <-> 0 (<type 'int'>)

======================================================================
FAIL: test_ArrayList (__main__.Test_CollectionsStringList)
create ArrayList in JVM (from the JavaSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Collections.py", line 208, in test_ArrayList
    elem0,type(elem0), listElem0, type(listElem0)))
AssertionError: should be equal: a (<type 'Object'>) <-> a (<type 'str'>)

----------------------------------------------------------------------
Ran 42 tests in 0.031s

FAILED (failures=4)

> -----Ursprüngliche Nachricht-----
> Von: Andi Vajda [mailto:vajda@apache.org]
> Gesendet: Donnerstag, 3. Mai 2012 01:00
> An: pylucene-dev@lucene.apache.org
> Betreff: Re: AW: AW: AW: PyLucene use JCC shared object by default
> 
> 
>   Hi Thomas,
> 
> On Wed, 2 May 2012, Thomas Koch wrote:
> 
> > could you download the patch from the link?
> 
> Yes, I got your patch just fine.
> 
> I fixed a few bugs today having to do with converting sequences to JArray
> and added support for auto-boxing primitive types when converting a
> sequence to an object JArray. Now your collections-demo.py all works fine !
> 
> With these fixes the Python toArray() methods can return a Python
> sequence object directly, there is no need to do the JArray conversion in
> Python anymore.
> 
> I simplified the collections.py file a bit to reflect the fixes and all changes,
> including the PythonList/PythonListIterator code is now checked in.
> 
> Could you please convert collections-demo.py into a proper unit test module
> like the unit tests in pylucene/test so that it gets integrated into the test
> suite ?
> 
> Thanks !
> 
> Andi..
> 
> >
> > Just one more thing ... in the initial implementation of PythonList I did the
> toArray() method in Python and the toArray(Object[]) method in Java - just
> as was done for the PythonSet:
> >
> > +    public native List subList(int fromIndex, int toIndex);
> > +    public native Object[] toArray();
> > +
> > +    public Object[] toArray(Object[] a)
> > +    {
> > +        Object[] array = toArray();
> > +
> > +        if (a.length < array.length)
> > +            a = (Object[])
> Array.newInstance(a.getClass().getComponentType(),
> > +                                             array.length);
> > +
> > +        System.arraycopy(array, 0, a, 0, array.length);
> > +
> > +        return a;
> > +    }
> >
> > (from patch of Feb 22nd I sent to you)
> >
> >> From the current patch you can see that the latter part is missing now -
> the toArray(Object[]) method is now done in Python as well, i.e. it simply
> calls toArray():
> >
> >
> ==========================================================
> =========
> > --- java/org/apache/pylucene/util/PythonSet.java	(revision 1332162)
> > +++ java/org/apache/pylucene/util/PythonSet.java	(working copy)
> > @@ -62,14 +62,6 @@
> >
> >     public Object[] toArray(Object[] a)
> >     {
> > -        Object[] array = toArray();
> > -
> > -        if (a.length < array.length)
> > -            a = (Object[]) Array.newInstance(a.getClass().getComponentType(),
> > -                                             array.length);
> > -
> > -        System.arraycopy(array, 0, a, 0, array.length);
> > -
> > -        return a;
> > +        return toArray();
> >     }
> > }
> >
> > As far as I remember that was part of your changes in between (I probably
> never touched PythonSet). Anyway I could imagine that this is related to the
> current problem.
> >
> > However, the ArrayList never calls the 2nd toArray method. The ArrayList
> constructor actually triggers the "simple" toArray method:
> >
> >    /**
> >     * Constructs a list containing the elements of the specified
> >     * collection, in the order they are returned by the collection's
> >     * iterator.
> >     *
> >     * @param c the collection whose elements are to be placed into this list
> >     * @throws NullPointerException if the specified collection is null
> >     */
> >    public ArrayList(Collection<? extends E> c) {
> >        elementData = c.toArray();
> >        size = elementData.length;
> >        // c.toArray might (incorrectly) not return Object[] (see 6260652)
> >        if (elementData.getClass() != Object[].class)
> >            elementData = Arrays.copyOf(elementData, size, Object[].class);
> >    }
> >
> > The ArrayList source code is attached (from OpenJDK6 sources).
> >
> > So maybe that's the wrong path ...  Anyhow I feel the mapping of
> toArray(Object[]) to toArray() does not fully comply with the Java API
> description:
> > http://docs.oracle.com/javase/6/docs/api/java/util/List.html#toArray(T
> > [])
> >
> >
> > Hope that helps...
> >
> >
> > Regards,
> > Thomas
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Andi Vajda [mailto:vajda@apache.org]
> >> Gesendet: Montag, 30. April 2012 19:32
> >> An: pylucene-dev@lucene.apache.org
> >> Betreff: Re: AW: AW: PyLucene use JCC shared object by default
> >>
> >>
> >> On Mon, 30 Apr 2012, Thomas Koch wrote:
> >>
> >>> Dear Andi, I again had a look at the patch I submitted recently and
> >>> would like to get back to it.  An updated version of the patch is
> >>> attached to this email - the patch is against the branch_3x repo
> >>> http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x
> >>
> >> Oh, and there is no attachment in your email. Maybe it got eaten up
> >> by some mail server. Please, make sure it's of a text mimetype or
> >> mail it to me directly.
> >>
> >> Thanks !
> >>
> >> Andi..
> >>
> >>>
> >>> The patch mainly
> >>> - adds two java classes:  PythonList,  PythonListIterator
> >>> - adds according Python classes   (JavaListIterator and JavaList in
> >> collections.py)
> >>>
> >>> Purpose:
> >>> - provide a Java-based List implementation in JCC/PyLucene (similar
> >>> to existing PythonSet/JavaSet)
> >>> - allow to pass python lists via Java Collections into PyLucene
> >>>
> >>> Let's try summarize shortly: PythonSet /JavaSet was already
> >>> existing, but
> >> nothing similar for Lists. I made an implementation of PythonList
> >> /JavaList and with your help this is now basically working. Except of
> >> an open issue that affects both JavaSet and JavaList: initialization
> >> of an ArrayList with a JavaSet (or JavaList) may cause trouble.
> >>>
> >>> As you said: "There is a bug somewhere with constructing an
> >>> ArrayList from
> >> a python collection like JavaSet or JavaList."
> >>>
> >>> I tried to change the toArray() method as you suggested, but that
> >>> didn't
> >> help. As far as I understood, there are two options to box python
> >> values into a typed JArray:
> >>>
> >>> 1)  use the object based JArray class and box python values by
> >>> wrapping
> >> them with the corresponding Java object (e.g. type<int> ->
> lucene.Integer):
> >>>
> >>>>>> x =
> >>>>>> lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(Fals
> >>>>>> e)
> >>>>>> ])
> >>> JArray<object>[<Object: true>, <Object: false>]
> >>>>>> type(x[0])
> >>> <type 'Object'>
> >>>
> >>> 2)  use the correct array type (int, float, etc.) and pass the list
> >>> of Python
> >> elements or literals) to the JArray constructur, e.g.
> >>>
> >>>>>> y = lucene.JArray('bool')([True,False])
> >>> JArray<bool>[True, False]
> >>>>>> type(y[0])
> >>> <type 'bool'>
> >>>
> >>> I tried both of them (see _pyList2JArray methods in collections.py)
> >>> but
> >> none of them did the trick. Actually the 'empty objects in ArrayList'
> >> problem remains when handling with strings (the ArrayList object that
> >> is initialized with a JavaSet or JavaList of string items will have a
> >> number of objects as the original JavaSet/JavaList, but all objects
> >> are the same - ooks like an array of empty objects). Furthermore another
> issue with integer lists comes into play:
> >> here the initialization of  ArrayList with the Collection fails with
> >> a Java stacktrace (lucene.JavaError: org.apache.jcc.PythonException).
> >>>
> >>> The most simple test case is as follows:
> >>>
> >>> --%< --
> >>> import lucene
> >>> lucene.initVM()
> >>> from lucene.collections import JavaList
> >>>
> >>> # using strings: the ArrayList is created, but initialized with
> >>> empty objects jl = JavaList(['a','b']) al = lucene.ArrayList(jl)
> >>> assert (not al.get(0).equals(al.get(1))), "unique values"
> >>>
> >>> # using ints: the ArrayList is not created,  but an error occurs instead:
> >>> # Java stacktrace: org.apache.jcc.PythonException: ('while calling
> >>> toArray') jl = JavaList(range(3)) al = lucene.ArrayList(jl) --%< --
> >>>
> >>> I currently feel like having to stab around in the dark to find out
> >>> what's going on here and would welcome any suggestions. Needs some
> >> JCC
> >>> expert I guess ,-)
> >>>
> >>> Of course we can leave the patch out - but still there's the same
> >>> issue with
> >> JavaSet.
> >>>
> >>>
> >>> kind regards
> >>>
> >>> Thomas
> >>> --
> >>> OrbiTeam Software GmbH & Co. KG, Germany http://www.orbiteam.de
> >>>
> >>>
> >>>> -----Ursprüngliche Nachricht-----
> >>>> Von: Andi Vajda [mailto:vajda@apache.org]
> >>>> Gesendet: Mittwoch, 18. April 2012 20:37
> >>>> An: pylucene-dev@lucene.apache.org
> >>>> Betreff: Re: AW: PyLucene use JCC shared object by default
> >>>>
> >>>>
> >>>> Hi Thomas,
> >>>> ...
> >>>> Lucene 3.6 just got released a few days ago. Apart from your patch,
> >>>> the PyLucene 3.6 release is ready. I'm about to go offline (email
> >>>> only) for a
> >> week.
> >>>> Let's revisit this patch then (first week of May). It's not
> >>>> blocking the
> >> release
> >>>> right now as, even if I sent out a release candidate for a vote,
> >>>> the three business days required for this would take this into the time
> I'm away.
> >>>> ...
> >>>> Andi..
> >>>
> >>>
> >

Re: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
  Hi Thomas,

On Wed, 2 May 2012, Thomas Koch wrote:

> could you download the patch from the link?

Yes, I got your patch just fine.

I fixed a few bugs today having to do with converting sequences to JArray
and added support for auto-boxing primitive types when converting a sequence
to an object JArray. Now your collections-demo.py all works fine !

With these fixes the Python toArray() methods can return a Python sequence 
object directly, there is no need to do the JArray conversion in Python 
anymore.

I simplified the collections.py file a bit to reflect the fixes and all 
changes, including the PythonList/PythonListIterator code is now checked in.

Could you please convert collections-demo.py into a proper unit test module 
like the unit tests in pylucene/test so that it gets integrated into the 
test suite ?

Thanks !

Andi..

>
> Just one more thing ... in the initial implementation of PythonList I did the toArray() method in Python and the toArray(Object[]) method in Java - just as was done for the PythonSet:
>
> +    public native List subList(int fromIndex, int toIndex);
> +    public native Object[] toArray();
> +
> +    public Object[] toArray(Object[] a)
> +    {
> +        Object[] array = toArray();
> +
> +        if (a.length < array.length)
> +            a = (Object[]) Array.newInstance(a.getClass().getComponentType(),
> +                                             array.length);
> +
> +        System.arraycopy(array, 0, a, 0, array.length);
> +
> +        return a;
> +    }
>
> (from patch of Feb 22nd I sent to you)
>
>> From the current patch you can see that the latter part is missing now - the toArray(Object[]) method is now done in Python as well, i.e. it simply calls toArray():
>
> ===================================================================
> --- java/org/apache/pylucene/util/PythonSet.java	(revision 1332162)
> +++ java/org/apache/pylucene/util/PythonSet.java	(working copy)
> @@ -62,14 +62,6 @@
>
>     public Object[] toArray(Object[] a)
>     {
> -        Object[] array = toArray();
> -
> -        if (a.length < array.length)
> -            a = (Object[]) Array.newInstance(a.getClass().getComponentType(),
> -                                             array.length);
> -
> -        System.arraycopy(array, 0, a, 0, array.length);
> -
> -        return a;
> +        return toArray();
>     }
> }
>
> As far as I remember that was part of your changes in between (I probably never touched PythonSet). Anyway I could imagine that this is related to the current problem.
>
> However, the ArrayList never calls the 2nd toArray method. The ArrayList constructor actually triggers the "simple" toArray method:
>
>    /**
>     * Constructs a list containing the elements of the specified
>     * collection, in the order they are returned by the collection's
>     * iterator.
>     *
>     * @param c the collection whose elements are to be placed into this list
>     * @throws NullPointerException if the specified collection is null
>     */
>    public ArrayList(Collection<? extends E> c) {
>        elementData = c.toArray();
>        size = elementData.length;
>        // c.toArray might (incorrectly) not return Object[] (see 6260652)
>        if (elementData.getClass() != Object[].class)
>            elementData = Arrays.copyOf(elementData, size, Object[].class);
>    }
>
> The ArrayList source code is attached (from OpenJDK6 sources).
>
> So maybe that's the wrong path ...  Anyhow I feel the mapping of toArray(Object[]) to toArray() does not fully comply with the Java API description:
> http://docs.oracle.com/javase/6/docs/api/java/util/List.html#toArray(T[])
>
>
> Hope that helps...
>
>
> Regards,
> Thomas
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andi Vajda [mailto:vajda@apache.org]
>> Gesendet: Montag, 30. April 2012 19:32
>> An: pylucene-dev@lucene.apache.org
>> Betreff: Re: AW: AW: PyLucene use JCC shared object by default
>>
>>
>> On Mon, 30 Apr 2012, Thomas Koch wrote:
>>
>>> Dear Andi, I again had a look at the patch I submitted recently and
>>> would like to get back to it.  An updated version of the patch is
>>> attached to this email - the patch is against the branch_3x repo
>>> http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x
>>
>> Oh, and there is no attachment in your email. Maybe it got eaten up by some
>> mail server. Please, make sure it's of a text mimetype or mail it to me
>> directly.
>>
>> Thanks !
>>
>> Andi..
>>
>>>
>>> The patch mainly
>>> - adds two java classes:  PythonList,  PythonListIterator
>>> - adds according Python classes   (JavaListIterator and JavaList in
>> collections.py)
>>>
>>> Purpose:
>>> - provide a Java-based List implementation in JCC/PyLucene (similar to
>>> existing PythonSet/JavaSet)
>>> - allow to pass python lists via Java Collections into PyLucene
>>>
>>> Let's try summarize shortly: PythonSet /JavaSet was already existing, but
>> nothing similar for Lists. I made an implementation of PythonList /JavaList
>> and with your help this is now basically working. Except of an open issue that
>> affects both JavaSet and JavaList: initialization of an ArrayList with a JavaSet
>> (or JavaList) may cause trouble.
>>>
>>> As you said: "There is a bug somewhere with constructing an ArrayList from
>> a python collection like JavaSet or JavaList."
>>>
>>> I tried to change the toArray() method as you suggested, but that didn't
>> help. As far as I understood, there are two options to box python values into
>> a typed JArray:
>>>
>>> 1)  use the object based JArray class and box python values by wrapping
>> them with the corresponding Java object (e.g. type<int> -> lucene.Integer):
>>>
>>>>>> x =
>>>>>> lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(False)
>>>>>> ])
>>> JArray<object>[<Object: true>, <Object: false>]
>>>>>> type(x[0])
>>> <type 'Object'>
>>>
>>> 2)  use the correct array type (int, float, etc.) and pass the list of Python
>> elements or literals) to the JArray constructur, e.g.
>>>
>>>>>> y = lucene.JArray('bool')([True,False])
>>> JArray<bool>[True, False]
>>>>>> type(y[0])
>>> <type 'bool'>
>>>
>>> I tried both of them (see _pyList2JArray methods in collections.py) but
>> none of them did the trick. Actually the 'empty objects in ArrayList' problem
>> remains when handling with strings (the ArrayList object that is initialized
>> with a JavaSet or JavaList of string items will have a number of objects as the
>> original JavaSet/JavaList, but all objects are the same - ooks like an array of
>> empty objects). Furthermore another issue with integer lists comes into play:
>> here the initialization of  ArrayList with the Collection fails with a Java
>> stacktrace (lucene.JavaError: org.apache.jcc.PythonException).
>>>
>>> The most simple test case is as follows:
>>>
>>> --%< --
>>> import lucene
>>> lucene.initVM()
>>> from lucene.collections import JavaList
>>>
>>> # using strings: the ArrayList is created, but initialized with empty
>>> objects jl = JavaList(['a','b']) al = lucene.ArrayList(jl) assert (not
>>> al.get(0).equals(al.get(1))), "unique values"
>>>
>>> # using ints: the ArrayList is not created,  but an error occurs instead:
>>> # Java stacktrace: org.apache.jcc.PythonException: ('while calling
>>> toArray') jl = JavaList(range(3)) al = lucene.ArrayList(jl) --%< --
>>>
>>> I currently feel like having to stab around in the dark to find out
>>> what's going on here and would welcome any suggestions. Needs some
>> JCC
>>> expert I guess ,-)
>>>
>>> Of course we can leave the patch out - but still there's the same issue with
>> JavaSet.
>>>
>>>
>>> kind regards
>>>
>>> Thomas
>>> --
>>> OrbiTeam Software GmbH & Co. KG, Germany http://www.orbiteam.de
>>>
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Andi Vajda [mailto:vajda@apache.org]
>>>> Gesendet: Mittwoch, 18. April 2012 20:37
>>>> An: pylucene-dev@lucene.apache.org
>>>> Betreff: Re: AW: PyLucene use JCC shared object by default
>>>>
>>>>
>>>> Hi Thomas,
>>>> ...
>>>> Lucene 3.6 just got released a few days ago. Apart from your patch, the
>>>> PyLucene 3.6 release is ready. I'm about to go offline (email only) for a
>> week.
>>>> Let's revisit this patch then (first week of May). It's not blocking the
>> release
>>>> right now as, even if I sent out a release candidate for a vote, the three
>>>> business days required for this would take this into the time I'm away.
>>>> ...
>>>> Andi..
>>>
>>>
>

Re: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
On Mon, 30 Apr 2012, Thomas Koch wrote:

> Dear Andi, I again had a look at the patch I submitted recently and would 
> like to get back to it.  An updated version of the patch is attached to 
> this email - the patch is against the branch_3x repo 
> http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x

Oh, and there is no attachment in your email. Maybe it got eaten up by some 
mail server. Please, make sure it's of a text mimetype or mail it to me 
directly.

Thanks !

Andi..

>
> The patch mainly
> - adds two java classes:  PythonList,  PythonListIterator
> - adds according Python classes   (JavaListIterator and JavaList in collections.py)
>
> Purpose:
> - provide a Java-based List implementation in JCC/PyLucene (similar to existing PythonSet/JavaSet)
> - allow to pass python lists via Java Collections into PyLucene
>
> Let's try summarize shortly: PythonSet /JavaSet was already existing, but nothing similar for Lists. I made an implementation of PythonList /JavaList and with your help this is now basically working. Except of an open issue that affects both JavaSet and JavaList: initialization of an ArrayList with a JavaSet (or JavaList) may cause trouble.
>
> As you said: "There is a bug somewhere with constructing an ArrayList from a python collection like JavaSet or JavaList."
>
> I tried to change the toArray() method as you suggested, but that didn't help. As far as I understood, there are two options to box python values into a typed JArray:
>
> 1)  use the object based JArray class and box python values by wrapping them with the corresponding Java object (e.g. type<int> -> lucene.Integer):
>
>>>> x = lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(False)])
> JArray<object>[<Object: true>, <Object: false>]
>>>> type(x[0])
> <type 'Object'>
>
> 2)  use the correct array type (int, float, etc.) and pass the list of Python elements or literals) to the JArray constructur, e.g.
>
>>>> y = lucene.JArray('bool')([True,False])
> JArray<bool>[True, False]
>>>> type(y[0])
> <type 'bool'>
>
> I tried both of them (see _pyList2JArray methods in collections.py) but none of them did the trick. Actually the 'empty objects in ArrayList' problem remains when handling with strings (the ArrayList object that is initialized with a JavaSet or JavaList of string items will have a number of objects as the original JavaSet/JavaList, but all objects are the same - ooks like an array of empty objects). Furthermore another issue with integer lists comes into play: here the initialization of  ArrayList with the Collection fails with a Java stacktrace (lucene.JavaError: org.apache.jcc.PythonException).
>
> The most simple test case is as follows:
>
> --%< --
> import lucene
> lucene.initVM()
> from lucene.collections import JavaList
>
> # using strings: the ArrayList is created, but initialized with empty objects
> jl = JavaList(['a','b'])
> al = lucene.ArrayList(jl)
> assert (not al.get(0).equals(al.get(1))), "unique values"
>
> # using ints: the ArrayList is not created,  but an error occurs instead:
> # Java stacktrace: org.apache.jcc.PythonException: ('while calling toArray')
> jl = JavaList(range(3))
> al = lucene.ArrayList(jl)
> --%< --
>
> I currently feel like having to stab around in the dark to find out what's going on here and would welcome any suggestions. Needs some JCC expert I guess ,-)
>
> Of course we can leave the patch out - but still there's the same issue with JavaSet.
>
>
> kind regards
>
> Thomas
> --
> OrbiTeam Software GmbH & Co. KG, Germany
> http://www.orbiteam.de
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andi Vajda [mailto:vajda@apache.org]
>> Gesendet: Mittwoch, 18. April 2012 20:37
>> An: pylucene-dev@lucene.apache.org
>> Betreff: Re: AW: PyLucene use JCC shared object by default
>>
>>
>> Hi Thomas,
>> ...
>> Lucene 3.6 just got released a few days ago. Apart from your patch, the
>> PyLucene 3.6 release is ready. I'm about to go offline (email only) for a week.
>> Let's revisit this patch then (first week of May). It's not blocking the release
>> right now as, even if I sent out a release candidate for a vote, the three
>> business days required for this would take this into the time I'm away.
>> ...
>> Andi..
>
>

Re: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
  Hi Thomas,

On Mon, 30 Apr 2012, Thomas Koch wrote:

> I again had a look at the patch I submitted recently and would like to get 
> back to it.  An updated version of the patch is attached to this email - 
> the patch is against the branch_3x repo 
> http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x

Thank you for sending this, I just got back from vacation and was going to 
ask you about this as I'd like to get the PyLucene 3.6 release out soon - if 
possible with you patch.

> The patch mainly
> - adds two java classes:  PythonList,  PythonListIterator
> - adds according Python classes   (JavaListIterator and JavaList in collections.py)
>
> Purpose:
> - provide a Java-based List implementation in JCC/PyLucene (similar to existing PythonSet/JavaSet)
> - allow to pass python lists via Java Collections into PyLucene
>
> Let's try summarize shortly: PythonSet /JavaSet was already existing, but 
> nothing similar for Lists. I made an implementation of PythonList 
> /JavaList and with your help this is now basically working. Except of an 
> open issue that affects both JavaSet and JavaList: initialization of an 
> ArrayList with a JavaSet (or JavaList) may cause trouble.
>
> As you said: "There is a bug somewhere with constructing an ArrayList from 
> a python collection like JavaSet or JavaList."
>
> I tried to change the toArray() method as you suggested, but that didn't 
> help. As far as I understood, there are two options to box python values 
> into a typed JArray:
>
> 1)  use the object based JArray class and box python values by wrapping 
> them with the corresponding Java object (e.g. type<int> -> 
> lucene.Integer):
>
>>>> x = lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(False)])
> JArray<object>[<Object: true>, <Object: false>]
>>>> type(x[0])
> <type 'Object'>
>
> 2)  use the correct array type (int, float, etc.) and pass the list of 
> Python elements or literals) to the JArray constructur, e.g.
>
>>>> y = lucene.JArray('bool')([True,False])
> JArray<bool>[True, False]
>>>> type(y[0])
> <type 'bool'>
>
> I tried both of them (see _pyList2JArray methods in collections.py) but 
> none of them did the trick. Actually the 'empty objects in ArrayList' 
> problem remains when handling with strings (the ArrayList object that is 
> initialized with a JavaSet or JavaList of string items will have a number 
> of objects as the original JavaSet/JavaList, but all objects are the same 
> - ooks like an array of empty objects). Furthermore another issue with 
> integer lists comes into play: here the initialization of ArrayList with 
> the Collection fails with a Java stacktrace (lucene.JavaError: 
> org.apache.jcc.PythonException).
>
> The most simple test case is as follows:
>
> --%< --
> import lucene
> lucene.initVM()
> from lucene.collections import JavaList
>
> # using strings: the ArrayList is created, but initialized with empty objects
> jl = JavaList(['a','b'])
> al = lucene.ArrayList(jl)
> assert (not al.get(0).equals(al.get(1))), "unique values"
>
> # using ints: the ArrayList is not created,  but an error occurs instead:
> # Java stacktrace: org.apache.jcc.PythonException: ('while calling toArray')
> jl = JavaList(range(3))
> al = lucene.ArrayList(jl)
> --%< --
>
> I currently feel like having to stab around in the dark to find out what's 
> going on here and would welcome any suggestions. Needs some JCC expert I 
> guess ,-)
>
> Of course we can leave the patch out - but still there's the same issue 
> with JavaSet.

I'd like to get to the bottom of this before the 3.6 release. It's a matter 
of my finding the time this week.

Thanks for reviving this !

Andi..

>
>
> kind regards
>
> Thomas
> --
> OrbiTeam Software GmbH & Co. KG, Germany
> http://www.orbiteam.de
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andi Vajda [mailto:vajda@apache.org]
>> Gesendet: Mittwoch, 18. April 2012 20:37
>> An: pylucene-dev@lucene.apache.org
>> Betreff: Re: AW: PyLucene use JCC shared object by default
>>
>>
>> Hi Thomas,
>> ...
>> Lucene 3.6 just got released a few days ago. Apart from your patch, the
>> PyLucene 3.6 release is ready. I'm about to go offline (email only) for a week.
>> Let's revisit this patch then (first week of May). It's not blocking the release
>> right now as, even if I sent out a release candidate for a vote, the three
>> business days required for this would take this into the time I'm away.
>> ...
>> Andi..
>
>

AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Dear Andi,
I again had a look at the patch I submitted recently and would like to get back to it.  An updated version of the patch is attached to this email - the patch is against the branch_3x repo http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x

The patch mainly 
- adds two java classes:  PythonList,  PythonListIterator
- adds according Python classes   (JavaListIterator and JavaList in collections.py)

Purpose: 
- provide a Java-based List implementation in JCC/PyLucene (similar to existing PythonSet/JavaSet)
- allow to pass python lists via Java Collections into PyLucene

Let's try summarize shortly: PythonSet /JavaSet was already existing, but nothing similar for Lists. I made an implementation of PythonList /JavaList and with your help this is now basically working. Except of an open issue that affects both JavaSet and JavaList: initialization of an ArrayList with a JavaSet (or JavaList) may cause trouble.

As you said: "There is a bug somewhere with constructing an ArrayList from a python collection like JavaSet or JavaList."

I tried to change the toArray() method as you suggested, but that didn't help. As far as I understood, there are two options to box python values into a typed JArray:

1)  use the object based JArray class and box python values by wrapping them with the corresponding Java object (e.g. type<int> -> lucene.Integer):

>>> x = lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(False)])
JArray<object>[<Object: true>, <Object: false>]
>>> type(x[0])
<type 'Object'>

2)  use the correct array type (int, float, etc.) and pass the list of Python elements or literals) to the JArray constructur, e.g.

>>> y = lucene.JArray('bool')([True,False])
JArray<bool>[True, False]
>>> type(y[0])
<type 'bool'>

I tried both of them (see _pyList2JArray methods in collections.py) but none of them did the trick. Actually the 'empty objects in ArrayList' problem remains when handling with strings (the ArrayList object that is initialized with a JavaSet or JavaList of string items will have a number of objects as the original JavaSet/JavaList, but all objects are the same - ooks like an array of empty objects). Furthermore another issue with integer lists comes into play: here the initialization of  ArrayList with the Collection fails with a Java stacktrace (lucene.JavaError: org.apache.jcc.PythonException). 

The most simple test case is as follows:

--%< --
import lucene
lucene.initVM()
from lucene.collections import JavaList

# using strings: the ArrayList is created, but initialized with empty objects
jl = JavaList(['a','b'])
al = lucene.ArrayList(jl)
assert (not al.get(0).equals(al.get(1))), "unique values"

# using ints: the ArrayList is not created,  but an error occurs instead:
# Java stacktrace: org.apache.jcc.PythonException: ('while calling toArray')
jl = JavaList(range(3))
al = lucene.ArrayList(jl)
--%< --

I currently feel like having to stab around in the dark to find out what's going on here and would welcome any suggestions. Needs some JCC expert I guess ,-)

Of course we can leave the patch out - but still there's the same issue with JavaSet.


kind regards

Thomas 
--
OrbiTeam Software GmbH & Co. KG, Germany
http://www.orbiteam.de


> -----Ursprüngliche Nachricht-----
> Von: Andi Vajda [mailto:vajda@apache.org]
> Gesendet: Mittwoch, 18. April 2012 20:37
> An: pylucene-dev@lucene.apache.org
> Betreff: Re: AW: PyLucene use JCC shared object by default
> 
> 
> Hi Thomas,
> ...
> Lucene 3.6 just got released a few days ago. Apart from your patch, the
> PyLucene 3.6 release is ready. I'm about to go offline (email only) for a week.
> Let's revisit this patch then (first week of May). It's not blocking the release
> right now as, even if I sent out a release candidate for a vote, the three
> business days required for this would take this into the time I'm away.
> ...
> Andi..


AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Hi Andi,
thanks for getting back on this issue.

> Lucene 3.6 just got released a few days ago. Apart from your patch, the
> PyLucene 3.6 release is ready. I'm about to go offline (email only) for a week.
> Let's revisit this patch then (first week of May). It's not blocking the release
> right now as, even if I sent out a release candidate for a vote, the three
> business days required for this would take this into the time I'm away.
> 
OK. I'll then try to summarize the current status and send you an update to the list later this week.

> Out of curiosity, why is this patch tied to the facetting module ? Can't you use
> the regular Java List implementations with it instead of a wrapped Python list
> ? If there are no wrappers for the classes you want, it's certainly easier to add
> them and they would provide a more efficient operation as Java code (the
> facet module) working with them wouldn't have to cross the VM barriers for
> each and every access into these lists.
>
I've uploaded the mentioned sample code here: 
http://pythonfiddle.com/python-facet-search/
(could go into PyLucene distribution under samples directory)

You can see the #TODO comment with JavaList and my initial motivation: the method call of 'setCategoryPaths' in CategoryDocumentBuilder at  
 categoryDocBuilder =  CategoryDocumentBuilder(taxo).setCategoryPaths(facetList)
did throw an  lucene.InvalidArgsError: (<type 'CategoryDocumentBuilder'>, 'setCategoryPaths',[<CategoryPath: root/a/f1>, <CategoryPath: root/a/f2>])
 when passing a pure python list of CategoryPath objects. That's why I thought I'd need a JavaList here (and later used it with success!).

Actually the method signature expects an argument type java.lang.Iterable here:
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-facet/org/apache/lucene/facet/index/CategoryDocumentBuilder.html#setCategoryPaths(java.lang.Iterable)

I remember (from my code comments) having tried Arrays.asList() first, but noticed some type-casting problems ... However: I tried it again today and it seems to work now! (Not sure what I did in the first place or what actually went wrong ...) -  just
 facetList = lucene.Arrays.asList(facetList) 
does it currently. So the above mentioned Facet Example actually runs without the JavaList implementation!

Sorry for the confusion ,-)


best regards
Thomas 
--
OrbiTeam Software GmbH & Co. KG, Germany
http://www.orbiteam.de


> -----Ursprüngliche Nachricht-----
> Von: Andi Vajda [mailto:vajda@apache.org]
> Gesendet: Mittwoch, 18. April 2012 20:37
> An: pylucene-dev@lucene.apache.org
> Betreff: Re: AW: PyLucene use JCC shared object by default
> 
> 
> Hi Thomas,
> 
> On Apr 18, 2012, at 6:31, "Thomas Koch" <ko...@orbiteam.de> wrote:
> 
> > Hi,
> > sounds like an interesting project – may I ask what you actually
> implemented and what’s the motivation (e.g. performance?)?
> >
> > I’ve started to experiment with the Facet support in Lucene (actually in
> PyLucene – ported an example to Python) and found that facetted search
> support in Lucene looks powerful (though API is still said to be ‘experimental’
> and I can’t say anything about performance yet).  I’m talking about the
> org.apache.lucene.facet.* packages – part of the contrib part of Lucene and
> available as JARs that’s accessible in PyLucene as well. I’m not that familiar
> with Solr but AFAIK it’s based on Lucene (Java) and should (hopefully) use
> the same Java code for its facet search support. Of course Solr adds some
> nice configuration support and web GUI to Lucene, but the ‘core’ search is
> built on Lucene (to my knowledge). So did you re-implement the Lucene
> facet search/index code (like TaxonomyReader/Writer, FacetRequest stuff
> etc.) in C++ or what part of Solr??
> >
> > Regarding Facet support in PyLucene I can share the samples I’ve ‘ported’
> to Python so far. There’s still a patch pending for JavaList (required by facet
> features) which I come back to later on this list (still some open issues).
> Hopefully this can be included in the PyLucene 3.6 version …
> 
> Lucene 3.6 just got released a few days ago. Apart from your patch, the
> PyLucene 3.6 release is ready. I'm about to go offline (email only) for a week.
> Let's revisit this patch then (first week of May). It's not blocking the release
> right now as, even if I sent out a release candidate for a vote, the three
> business days required for this would take this into the time I'm away.
> 
> Out of curiosity, why is this patch tied to the facetting module ? Can't you use
> the regular Java List implementations with it instead of a wrapped Python list
> ? If there are no wrappers for the classes you want, it's certainly easier to add
> them and they would provide a more efficient operation as Java code (the
> facet module) working with them wouldn't have to cross the VM barriers for
> each and every access into these lists.
> 
> Andi..
> 
> >
> > Regards
> > Thomas
> > --
> > OrbiTeam Software GmbH & Co. KG
> > Germany  http://www.orbiteam.de
> >
> >
> > Von: Caleb Burns [mailto:caleb@ridersdiscount.com]
> > Gesendet: Dienstag, 17. April 2012 21:16
> > An: pylucene-dev@lucene.apache.org
> > Betreff: PyLucene use JCC shared object by default
> >
> > Hi,
> >
> > I've finished the process at my organization of re-implementing SOLR's
> faceting algorithm (in C++).
> >
> > We would like the public at large to have access to the work we've done
> and plan to do. In order for this to be a real possibility the code needs to be
> built against and use the same JVM as the PyLucene installation does. The
> most logical way we feel to have this accomplished is by having PyLucenes'
> default installation use JCC as a Shared Object.
> >
> > We have yet more plans to extend and provide utilities that work with
> PyLucene, but this all hinges on having the shared object. The only
> alternative methodology would require the bundling of our source with the
> PyLucene project itself as a fork.
> >
> > We are eager to start open sourcing our work, so please let us know what
> would be the best way to integrate our work.
> >
> > --
> > Caleb Burns
> > Developer | Riders Discount
> > 866.931.6644 x851 | www.RidersDiscount.com
> >
> > Deal of the Day
> >
> >
> >



Re: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
Hi Thomas, 

On Apr 18, 2012, at 6:31, "Thomas Koch" <ko...@orbiteam.de> wrote:

> Hi,
> sounds like an interesting project – may I ask what you actually implemented and what’s the motivation (e.g. performance?)?
> 
> I’ve started to experiment with the Facet support in Lucene (actually in PyLucene – ported an example to Python) and found that facetted search support in Lucene looks powerful (though API is still said to be ‘experimental’ and I can’t say anything about performance yet).  I’m talking about the org.apache.lucene.facet.* packages – part of the contrib part of Lucene and available as JARs that’s accessible in PyLucene as well. I’m not that familiar with Solr but AFAIK it’s based on Lucene (Java) and should (hopefully) use the same Java code for its facet search support. Of course Solr adds some nice configuration support and web GUI to Lucene, but the ‘core’ search is built on Lucene (to my knowledge). So did you re-implement the Lucene facet search/index code (like TaxonomyReader/Writer, FacetRequest stuff etc.) in C++ or what part of Solr??
> 
> Regarding Facet support in PyLucene I can share the samples I’ve ‘ported’ to Python so far. There’s still a patch pending for JavaList (required by facet features) which I come back to later on this list (still some open issues). Hopefully this can be included in the PyLucene 3.6 version …

Lucene 3.6 just got released a few days ago. Apart from your patch, the PyLucene 3.6 release is ready. I'm about to go offline (email only) for a week. Let's revisit this patch then (first week of May). It's not blocking the release right now as, even if I sent out a release candidate for a vote, the three business days required for this would take this into the time I'm away.

Out of curiosity, why is this patch tied to the facetting module ? Can't you use the regular Java List implementations with it instead of a wrapped Python list ? If there are no wrappers for the classes you want, it's certainly easier to add them and they would provide a more efficient operation as Java code (the facet module) working with them wouldn't have to cross the VM barriers for each and every access into these lists.

Andi..

> 
> Regards
> Thomas
> --
> OrbiTeam Software GmbH & Co. KG
> Germany  http://www.orbiteam.de
> 
> 
> Von: Caleb Burns [mailto:caleb@ridersdiscount.com] 
> Gesendet: Dienstag, 17. April 2012 21:16
> An: pylucene-dev@lucene.apache.org
> Betreff: PyLucene use JCC shared object by default
> 
> Hi,
> 
> I've finished the process at my organization of re-implementing SOLR's faceting algorithm (in C++).
> 
> We would like the public at large to have access to the work we've done and plan to do. In order for this to be a real possibility the code needs to be built against and use the same JVM as the PyLucene installation does. The most logical way we feel to have this accomplished is by having PyLucenes' default installation use JCC as a Shared Object.
> 
> We have yet more plans to extend and provide utilities that work with PyLucene, but this all hinges on having the shared object. The only alternative methodology would require the bundling of our source with the PyLucene project itself as a fork.
> 
> We are eager to start open sourcing our work, so please let us know what would be the best way to integrate our work.
> 
> -- 
> Caleb Burns
> Developer | Riders Discount
> 866.931.6644 x851 | www.RidersDiscount.com 
> 
> Deal of the Day
> 
> 
> 

AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Hi,
sounds like an interesting project – may I ask what you actually implemented and what’s the motivation (e.g. performance?)?

I’ve started to experiment with the Facet support in Lucene (actually in PyLucene – ported an example to Python) and found that facetted search support in Lucene looks powerful (though API is still said to be ‘experimental’ and I can’t say anything about performance yet).  I’m talking about the org.apache.lucene.facet.* packages – part of the contrib part of Lucene and available as JARs that’s accessible in PyLucene as well. I’m not that familiar with Solr but AFAIK it’s based on Lucene (Java) and should (hopefully) use the same Java code for its facet search support. Of course Solr adds some nice configuration support and web GUI to Lucene, but the ‘core’ search is built on Lucene (to my knowledge). So did you re-implement the Lucene facet search/index code (like TaxonomyReader/Writer, FacetRequest stuff etc.) in C++ or what part of Solr??

Regarding Facet support in PyLucene I can share the samples I’ve ‘ported’ to Python so far. There’s still a patch pending for JavaList (required by facet features) which I come back to later on this list (still some open issues). Hopefully this can be included in the PyLucene 3.6 version …

Regards
Thomas
--
OrbiTeam Software GmbH & Co. KG
Germany  http://www.orbiteam.de


Von: Caleb Burns [mailto:caleb@ridersdiscount.com] 
Gesendet: Dienstag, 17. April 2012 21:16
An: pylucene-dev@lucene.apache.org
Betreff: PyLucene use JCC shared object by default

Hi,

I've finished the process at my organization of re-implementing SOLR's faceting algorithm (in C++).

We would like the public at large to have access to the work we've done and plan to do. In order for this to be a real possibility the code needs to be built against and use the same JVM as the PyLucene installation does. The most logical way we feel to have this accomplished is by having PyLucenes' default installation use JCC as a Shared Object.

We have yet more plans to extend and provide utilities that work with PyLucene, but this all hinges on having the shared object. The only alternative methodology would require the bundling of our source with the PyLucene project itself as a fork.

We are eager to start open sourcing our work, so please let us know what would be the best way to integrate our work.

-- 
Caleb Burns
Developer | Riders Discount
866.931.6644 x851 | www.RidersDiscount.com 
 
Deal of the Day