You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Andi Vajda <va...@apache.org> on 2012/05/03 01:00:14 UTC

Re: AW: AW: AW: PyLucene use JCC shared object by default

  Hi Thomas,

On Wed, 2 May 2012, Thomas Koch wrote:

> could you download the patch from the link?

Yes, I got your patch just fine.

I fixed a few bugs today having to do with converting sequences to JArray
and added support for auto-boxing primitive types when converting a sequence
to an object JArray. Now your collections-demo.py all works fine !

With these fixes the Python toArray() methods can return a Python sequence 
object directly, there is no need to do the JArray conversion in Python 
anymore.

I simplified the collections.py file a bit to reflect the fixes and all 
changes, including the PythonList/PythonListIterator code is now checked in.

Could you please convert collections-demo.py into a proper unit test module 
like the unit tests in pylucene/test so that it gets integrated into the 
test suite ?

Thanks !

Andi..

>
> Just one more thing ... in the initial implementation of PythonList I did the toArray() method in Python and the toArray(Object[]) method in Java - just as was done for the PythonSet:
>
> +    public native List subList(int fromIndex, int toIndex);
> +    public native Object[] toArray();
> +
> +    public Object[] toArray(Object[] a)
> +    {
> +        Object[] array = toArray();
> +
> +        if (a.length < array.length)
> +            a = (Object[]) Array.newInstance(a.getClass().getComponentType(),
> +                                             array.length);
> +
> +        System.arraycopy(array, 0, a, 0, array.length);
> +
> +        return a;
> +    }
>
> (from patch of Feb 22nd I sent to you)
>
>> From the current patch you can see that the latter part is missing now - the toArray(Object[]) method is now done in Python as well, i.e. it simply calls toArray():
>
> ===================================================================
> --- java/org/apache/pylucene/util/PythonSet.java	(revision 1332162)
> +++ java/org/apache/pylucene/util/PythonSet.java	(working copy)
> @@ -62,14 +62,6 @@
>
>     public Object[] toArray(Object[] a)
>     {
> -        Object[] array = toArray();
> -
> -        if (a.length < array.length)
> -            a = (Object[]) Array.newInstance(a.getClass().getComponentType(),
> -                                             array.length);
> -
> -        System.arraycopy(array, 0, a, 0, array.length);
> -
> -        return a;
> +        return toArray();
>     }
> }
>
> As far as I remember that was part of your changes in between (I probably never touched PythonSet). Anyway I could imagine that this is related to the current problem.
>
> However, the ArrayList never calls the 2nd toArray method. The ArrayList constructor actually triggers the "simple" toArray method:
>
>    /**
>     * Constructs a list containing the elements of the specified
>     * collection, in the order they are returned by the collection's
>     * iterator.
>     *
>     * @param c the collection whose elements are to be placed into this list
>     * @throws NullPointerException if the specified collection is null
>     */
>    public ArrayList(Collection<? extends E> c) {
>        elementData = c.toArray();
>        size = elementData.length;
>        // c.toArray might (incorrectly) not return Object[] (see 6260652)
>        if (elementData.getClass() != Object[].class)
>            elementData = Arrays.copyOf(elementData, size, Object[].class);
>    }
>
> The ArrayList source code is attached (from OpenJDK6 sources).
>
> So maybe that's the wrong path ...  Anyhow I feel the mapping of toArray(Object[]) to toArray() does not fully comply with the Java API description:
> http://docs.oracle.com/javase/6/docs/api/java/util/List.html#toArray(T[])
>
>
> Hope that helps...
>
>
> Regards,
> Thomas
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andi Vajda [mailto:vajda@apache.org]
>> Gesendet: Montag, 30. April 2012 19:32
>> An: pylucene-dev@lucene.apache.org
>> Betreff: Re: AW: AW: PyLucene use JCC shared object by default
>>
>>
>> On Mon, 30 Apr 2012, Thomas Koch wrote:
>>
>>> Dear Andi, I again had a look at the patch I submitted recently and
>>> would like to get back to it.  An updated version of the patch is
>>> attached to this email - the patch is against the branch_3x repo
>>> http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x
>>
>> Oh, and there is no attachment in your email. Maybe it got eaten up by some
>> mail server. Please, make sure it's of a text mimetype or mail it to me
>> directly.
>>
>> Thanks !
>>
>> Andi..
>>
>>>
>>> The patch mainly
>>> - adds two java classes:  PythonList,  PythonListIterator
>>> - adds according Python classes   (JavaListIterator and JavaList in
>> collections.py)
>>>
>>> Purpose:
>>> - provide a Java-based List implementation in JCC/PyLucene (similar to
>>> existing PythonSet/JavaSet)
>>> - allow to pass python lists via Java Collections into PyLucene
>>>
>>> Let's try summarize shortly: PythonSet /JavaSet was already existing, but
>> nothing similar for Lists. I made an implementation of PythonList /JavaList
>> and with your help this is now basically working. Except of an open issue that
>> affects both JavaSet and JavaList: initialization of an ArrayList with a JavaSet
>> (or JavaList) may cause trouble.
>>>
>>> As you said: "There is a bug somewhere with constructing an ArrayList from
>> a python collection like JavaSet or JavaList."
>>>
>>> I tried to change the toArray() method as you suggested, but that didn't
>> help. As far as I understood, there are two options to box python values into
>> a typed JArray:
>>>
>>> 1)  use the object based JArray class and box python values by wrapping
>> them with the corresponding Java object (e.g. type<int> -> lucene.Integer):
>>>
>>>>>> x =
>>>>>> lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(False)
>>>>>> ])
>>> JArray<object>[<Object: true>, <Object: false>]
>>>>>> type(x[0])
>>> <type 'Object'>
>>>
>>> 2)  use the correct array type (int, float, etc.) and pass the list of Python
>> elements or literals) to the JArray constructur, e.g.
>>>
>>>>>> y = lucene.JArray('bool')([True,False])
>>> JArray<bool>[True, False]
>>>>>> type(y[0])
>>> <type 'bool'>
>>>
>>> I tried both of them (see _pyList2JArray methods in collections.py) but
>> none of them did the trick. Actually the 'empty objects in ArrayList' problem
>> remains when handling with strings (the ArrayList object that is initialized
>> with a JavaSet or JavaList of string items will have a number of objects as the
>> original JavaSet/JavaList, but all objects are the same - ooks like an array of
>> empty objects). Furthermore another issue with integer lists comes into play:
>> here the initialization of  ArrayList with the Collection fails with a Java
>> stacktrace (lucene.JavaError: org.apache.jcc.PythonException).
>>>
>>> The most simple test case is as follows:
>>>
>>> --%< --
>>> import lucene
>>> lucene.initVM()
>>> from lucene.collections import JavaList
>>>
>>> # using strings: the ArrayList is created, but initialized with empty
>>> objects jl = JavaList(['a','b']) al = lucene.ArrayList(jl) assert (not
>>> al.get(0).equals(al.get(1))), "unique values"
>>>
>>> # using ints: the ArrayList is not created,  but an error occurs instead:
>>> # Java stacktrace: org.apache.jcc.PythonException: ('while calling
>>> toArray') jl = JavaList(range(3)) al = lucene.ArrayList(jl) --%< --
>>>
>>> I currently feel like having to stab around in the dark to find out
>>> what's going on here and would welcome any suggestions. Needs some
>> JCC
>>> expert I guess ,-)
>>>
>>> Of course we can leave the patch out - but still there's the same issue with
>> JavaSet.
>>>
>>>
>>> kind regards
>>>
>>> Thomas
>>> --
>>> OrbiTeam Software GmbH & Co. KG, Germany http://www.orbiteam.de
>>>
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Andi Vajda [mailto:vajda@apache.org]
>>>> Gesendet: Mittwoch, 18. April 2012 20:37
>>>> An: pylucene-dev@lucene.apache.org
>>>> Betreff: Re: AW: PyLucene use JCC shared object by default
>>>>
>>>>
>>>> Hi Thomas,
>>>> ...
>>>> Lucene 3.6 just got released a few days ago. Apart from your patch, the
>>>> PyLucene 3.6 release is ready. I'm about to go offline (email only) for a
>> week.
>>>> Let's revisit this patch then (first week of May). It's not blocking the
>> release
>>>> right now as, even if I sent out a release candidate for a vote, the three
>>>> business days required for this would take this into the time I'm away.
>>>> ...
>>>> Andi..
>>>
>>>
>

Re: AW: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
On Fri, 4 May 2012, Thomas Koch wrote:

> Thanks, Andi - test runs fine now.
>
> I've another small contribution to offer: samples/java/FacetExample.py -
> It's a python port of the facet example in java in package
> org.apache.lucene.facet.example.simple (actually it's a bit simplified as
> the four java files are merged into one python file: SimpleIndexer.java,
> SimpleMain.java,  SimpleSearcher.java,  SimpleUtils.java).
>
> A patch to branch3x is attached - you may want to include it (up to you of
> course).
>
> Regards and have a nice weekend,

Oh cool. Just before I was going to start the release process.
Thanks !

Andi..

>
> Thomas
>
> -----Ursprüngliche Nachricht-----
> Von: Andi Vajda [mailto:vajda@apache.org]
> Gesendet: Donnerstag, 3. Mai 2012 20:14
> An: pylucene-dev@lucene.apache.org
> Betreff: Re: AW: AW: AW: AW: PyLucene use JCC shared object by default
> ...
>
> I fixed your patch to use cast() so as to unbox the boxed primitive types
> (and strings) to resolve the failures.
>
> Thank you for the patch, it's now checked in.
>
> Andi..
>

AW: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Thanks, Andi - test runs fine now.

I've another small contribution to offer: samples/java/FacetExample.py -
It's a python port of the facet example in java in package
org.apache.lucene.facet.example.simple (actually it's a bit simplified as
the four java files are merged into one python file: SimpleIndexer.java,
SimpleMain.java,  SimpleSearcher.java,  SimpleUtils.java).

A patch to branch3x is attached - you may want to include it (up to you of
course).

Regards and have a nice weekend,

Thomas

-----Ursprüngliche Nachricht-----
Von: Andi Vajda [mailto:vajda@apache.org] 
Gesendet: Donnerstag, 3. Mai 2012 20:14
An: pylucene-dev@lucene.apache.org
Betreff: Re: AW: AW: AW: AW: PyLucene use JCC shared object by default
...

I fixed your patch to use cast() so as to unbox the boxed primitive types
(and strings) to resolve the failures.

Thank you for the patch, it's now checked in.

Andi..

Re: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
On Thu, 3 May 2012, Thomas Koch wrote:

> thanks for the fixes and cleanup! I've updated from SVN and wrote the unit 
> test. Runs withtout errors (Tracebacks etc.) now, however there may be 
> some slight type mismatch issue still: the comparison of objects retrieved 
> from JArray and ArrayList respectively with the objects from the initial 
> JavaList does not match in case of the ArrayList (test yields 4 failures - 
> see below). That's with my newly built JCC2.12/PyLucene3.5 from branch_3x.
>
> As far as I understand
> elem0 = jArray[0] -> yields python object
>
> elem0 = arrayList.get(0) -> yields wrapped Java object
>
> Not sure if that's intended. In that case the test should be fixed ,-)

I fixed your patch to use cast() so as to unbox the boxed primitive types 
(and strings) to resolve the failures.

Thank you for the patch, it's now checked in.

Andi..

AW: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
> > As far as I understand
> > elem0 = jArray[0] -> yields python object
> >
> > elem0 = arrayList.get(0) -> yields wrapped Java object
> >
> > Not sure if that's intended. In that case the test should be fixed ,-)
> 
> If the array is an array of object, then objects you get, including
instances of
> java.lang.Integer. If the array is array of int, for example, then ints
you get.
> 
Ah I see - it's that the toArray() method runs in "Python land" still while
ArrayList is in JVM already, right!? (In fact the ArrayList constructor
itself calls toArray() too but then inside JVM and JCC probably does the
conversion to Java Objects while the Python2Java frontier is passed I
guess...)

This mix of Python and Java is sometimes confusing, but that's the price you
have to pay ,-)

Regards,
Thomas 




Re: AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Andi Vajda <va...@apache.org>.
On Thu, 3 May 2012, Thomas Koch wrote:

> thanks for the fixes and cleanup! I've updated from SVN and wrote the unit 
> test. Runs withtout errors (Tracebacks etc.) now, however there may be 
> some slight type mismatch issue still: the comparison of objects retrieved 
> from JArray and ArrayList respectively with the objects from the initial 
> JavaList does not match in case of the ArrayList (test yields 4 failures - 
> see below). That's with my newly built JCC2.12/PyLucene3.5 from branch_3x.
>
> As far as I understand
> elem0 = jArray[0] -> yields python object
>
> elem0 = arrayList.get(0) -> yields wrapped Java object
>
> Not sure if that's intended. In that case the test should be fixed ,-)

If the array is an array of object, then objects you get, including 
instances of java.lang.Integer. If the array is array of int, for example, 
then ints you get.

> Attached is a patch with the test - this time with .txt extension - hope 
> it get's through...

Yes, the patch got through, thank you.

Andi..

>
> regards
> Thomas
>
> ======================================================================
> FAIL: test_ArrayList (__main__.Test_CollectionsBoolList)
> create ArrayList in JVM (from the JavaSet)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_Collections.py", line 208, in test_ArrayList
>    elem0,type(elem0), listElem0, type(listElem0)))
> AssertionError: should be equal: true (<type 'Object'>) <-> True (<type 'bool'>
>
>
> ======================================================================
> FAIL: test_ArrayList (__main__.Test_CollectionsFloatList)
> create ArrayList in JVM (from the JavaSet)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_Collections.py", line 208, in test_ArrayList
>    elem0,type(elem0), listElem0, type(listElem0)))
> AssertionError: should be equal: 1.5 (<type 'Object'>) <-> 1.5 (<type 'float'>)
>
> ======================================================================
> FAIL: test_ArrayList (__main__.Test_CollectionsListBase)
> create ArrayList in JVM (from the JavaSet)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_Collections.py", line 208, in test_ArrayList
>    elem0,type(elem0), listElem0, type(listElem0)))
> AssertionError: should be equal: 0 (<type 'Object'>) <-> 0 (<type 'int'>)
>
> ======================================================================
> FAIL: test_ArrayList (__main__.Test_CollectionsStringList)
> create ArrayList in JVM (from the JavaSet)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test_Collections.py", line 208, in test_ArrayList
>    elem0,type(elem0), listElem0, type(listElem0)))
> AssertionError: should be equal: a (<type 'Object'>) <-> a (<type 'str'>)
>
> ----------------------------------------------------------------------
> Ran 42 tests in 0.031s
>
> FAILED (failures=4)
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andi Vajda [mailto:vajda@apache.org]
>> Gesendet: Donnerstag, 3. Mai 2012 01:00
>> An: pylucene-dev@lucene.apache.org
>> Betreff: Re: AW: AW: AW: PyLucene use JCC shared object by default
>>
>>
>>   Hi Thomas,
>>
>> On Wed, 2 May 2012, Thomas Koch wrote:
>>
>>> could you download the patch from the link?
>>
>> Yes, I got your patch just fine.
>>
>> I fixed a few bugs today having to do with converting sequences to JArray
>> and added support for auto-boxing primitive types when converting a
>> sequence to an object JArray. Now your collections-demo.py all works fine !
>>
>> With these fixes the Python toArray() methods can return a Python
>> sequence object directly, there is no need to do the JArray conversion in
>> Python anymore.
>>
>> I simplified the collections.py file a bit to reflect the fixes and all changes,
>> including the PythonList/PythonListIterator code is now checked in.
>>
>> Could you please convert collections-demo.py into a proper unit test module
>> like the unit tests in pylucene/test so that it gets integrated into the test
>> suite ?
>>
>> Thanks !
>>
>> Andi..
>>
>>>
>>> Just one more thing ... in the initial implementation of PythonList I did the
>> toArray() method in Python and the toArray(Object[]) method in Java - just
>> as was done for the PythonSet:
>>>
>>> +    public native List subList(int fromIndex, int toIndex);
>>> +    public native Object[] toArray();
>>> +
>>> +    public Object[] toArray(Object[] a)
>>> +    {
>>> +        Object[] array = toArray();
>>> +
>>> +        if (a.length < array.length)
>>> +            a = (Object[])
>> Array.newInstance(a.getClass().getComponentType(),
>>> +                                             array.length);
>>> +
>>> +        System.arraycopy(array, 0, a, 0, array.length);
>>> +
>>> +        return a;
>>> +    }
>>>
>>> (from patch of Feb 22nd I sent to you)
>>>
>>>> From the current patch you can see that the latter part is missing now -
>> the toArray(Object[]) method is now done in Python as well, i.e. it simply
>> calls toArray():
>>>
>>>
>> ==========================================================
>> =========
>>> --- java/org/apache/pylucene/util/PythonSet.java	(revision 1332162)
>>> +++ java/org/apache/pylucene/util/PythonSet.java	(working copy)
>>> @@ -62,14 +62,6 @@
>>>
>>>     public Object[] toArray(Object[] a)
>>>     {
>>> -        Object[] array = toArray();
>>> -
>>> -        if (a.length < array.length)
>>> -            a = (Object[]) Array.newInstance(a.getClass().getComponentType(),
>>> -                                             array.length);
>>> -
>>> -        System.arraycopy(array, 0, a, 0, array.length);
>>> -
>>> -        return a;
>>> +        return toArray();
>>>     }
>>> }
>>>
>>> As far as I remember that was part of your changes in between (I probably
>> never touched PythonSet). Anyway I could imagine that this is related to the
>> current problem.
>>>
>>> However, the ArrayList never calls the 2nd toArray method. The ArrayList
>> constructor actually triggers the "simple" toArray method:
>>>
>>>    /**
>>>     * Constructs a list containing the elements of the specified
>>>     * collection, in the order they are returned by the collection's
>>>     * iterator.
>>>     *
>>>     * @param c the collection whose elements are to be placed into this list
>>>     * @throws NullPointerException if the specified collection is null
>>>     */
>>>    public ArrayList(Collection<? extends E> c) {
>>>        elementData = c.toArray();
>>>        size = elementData.length;
>>>        // c.toArray might (incorrectly) not return Object[] (see 6260652)
>>>        if (elementData.getClass() != Object[].class)
>>>            elementData = Arrays.copyOf(elementData, size, Object[].class);
>>>    }
>>>
>>> The ArrayList source code is attached (from OpenJDK6 sources).
>>>
>>> So maybe that's the wrong path ...  Anyhow I feel the mapping of
>> toArray(Object[]) to toArray() does not fully comply with the Java API
>> description:
>>> http://docs.oracle.com/javase/6/docs/api/java/util/List.html#toArray(T
>>> [])
>>>
>>>
>>> Hope that helps...
>>>
>>>
>>> Regards,
>>> Thomas
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Andi Vajda [mailto:vajda@apache.org]
>>>> Gesendet: Montag, 30. April 2012 19:32
>>>> An: pylucene-dev@lucene.apache.org
>>>> Betreff: Re: AW: AW: PyLucene use JCC shared object by default
>>>>
>>>>
>>>> On Mon, 30 Apr 2012, Thomas Koch wrote:
>>>>
>>>>> Dear Andi, I again had a look at the patch I submitted recently and
>>>>> would like to get back to it.  An updated version of the patch is
>>>>> attached to this email - the patch is against the branch_3x repo
>>>>> http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x
>>>>
>>>> Oh, and there is no attachment in your email. Maybe it got eaten up
>>>> by some mail server. Please, make sure it's of a text mimetype or
>>>> mail it to me directly.
>>>>
>>>> Thanks !
>>>>
>>>> Andi..
>>>>
>>>>>
>>>>> The patch mainly
>>>>> - adds two java classes:  PythonList,  PythonListIterator
>>>>> - adds according Python classes   (JavaListIterator and JavaList in
>>>> collections.py)
>>>>>
>>>>> Purpose:
>>>>> - provide a Java-based List implementation in JCC/PyLucene (similar
>>>>> to existing PythonSet/JavaSet)
>>>>> - allow to pass python lists via Java Collections into PyLucene
>>>>>
>>>>> Let's try summarize shortly: PythonSet /JavaSet was already
>>>>> existing, but
>>>> nothing similar for Lists. I made an implementation of PythonList
>>>> /JavaList and with your help this is now basically working. Except of
>>>> an open issue that affects both JavaSet and JavaList: initialization
>>>> of an ArrayList with a JavaSet (or JavaList) may cause trouble.
>>>>>
>>>>> As you said: "There is a bug somewhere with constructing an
>>>>> ArrayList from
>>>> a python collection like JavaSet or JavaList."
>>>>>
>>>>> I tried to change the toArray() method as you suggested, but that
>>>>> didn't
>>>> help. As far as I understood, there are two options to box python
>>>> values into a typed JArray:
>>>>>
>>>>> 1)  use the object based JArray class and box python values by
>>>>> wrapping
>>>> them with the corresponding Java object (e.g. type<int> ->
>> lucene.Integer):
>>>>>
>>>>>>>> x =
>>>>>>>> lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(Fals
>>>>>>>> e)
>>>>>>>> ])
>>>>> JArray<object>[<Object: true>, <Object: false>]
>>>>>>>> type(x[0])
>>>>> <type 'Object'>
>>>>>
>>>>> 2)  use the correct array type (int, float, etc.) and pass the list
>>>>> of Python
>>>> elements or literals) to the JArray constructur, e.g.
>>>>>
>>>>>>>> y = lucene.JArray('bool')([True,False])
>>>>> JArray<bool>[True, False]
>>>>>>>> type(y[0])
>>>>> <type 'bool'>
>>>>>
>>>>> I tried both of them (see _pyList2JArray methods in collections.py)
>>>>> but
>>>> none of them did the trick. Actually the 'empty objects in ArrayList'
>>>> problem remains when handling with strings (the ArrayList object that
>>>> is initialized with a JavaSet or JavaList of string items will have a
>>>> number of objects as the original JavaSet/JavaList, but all objects
>>>> are the same - ooks like an array of empty objects). Furthermore another
>> issue with integer lists comes into play:
>>>> here the initialization of  ArrayList with the Collection fails with
>>>> a Java stacktrace (lucene.JavaError: org.apache.jcc.PythonException).
>>>>>
>>>>> The most simple test case is as follows:
>>>>>
>>>>> --%< --
>>>>> import lucene
>>>>> lucene.initVM()
>>>>> from lucene.collections import JavaList
>>>>>
>>>>> # using strings: the ArrayList is created, but initialized with
>>>>> empty objects jl = JavaList(['a','b']) al = lucene.ArrayList(jl)
>>>>> assert (not al.get(0).equals(al.get(1))), "unique values"
>>>>>
>>>>> # using ints: the ArrayList is not created,  but an error occurs instead:
>>>>> # Java stacktrace: org.apache.jcc.PythonException: ('while calling
>>>>> toArray') jl = JavaList(range(3)) al = lucene.ArrayList(jl) --%< --
>>>>>
>>>>> I currently feel like having to stab around in the dark to find out
>>>>> what's going on here and would welcome any suggestions. Needs some
>>>> JCC
>>>>> expert I guess ,-)
>>>>>
>>>>> Of course we can leave the patch out - but still there's the same
>>>>> issue with
>>>> JavaSet.
>>>>>
>>>>>
>>>>> kind regards
>>>>>
>>>>> Thomas
>>>>> --
>>>>> OrbiTeam Software GmbH & Co. KG, Germany http://www.orbiteam.de
>>>>>
>>>>>
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: Andi Vajda [mailto:vajda@apache.org]
>>>>>> Gesendet: Mittwoch, 18. April 2012 20:37
>>>>>> An: pylucene-dev@lucene.apache.org
>>>>>> Betreff: Re: AW: PyLucene use JCC shared object by default
>>>>>>
>>>>>>
>>>>>> Hi Thomas,
>>>>>> ...
>>>>>> Lucene 3.6 just got released a few days ago. Apart from your patch,
>>>>>> the PyLucene 3.6 release is ready. I'm about to go offline (email
>>>>>> only) for a
>>>> week.
>>>>>> Let's revisit this patch then (first week of May). It's not
>>>>>> blocking the
>>>> release
>>>>>> right now as, even if I sent out a release candidate for a vote,
>>>>>> the three business days required for this would take this into the time
>> I'm away.
>>>>>> ...
>>>>>> Andi..
>>>>>
>>>>>
>>>
>

AW: AW: AW: AW: PyLucene use JCC shared object by default

Posted by Thomas Koch <ko...@orbiteam.de>.
Andi,
thanks for the fixes and cleanup! I've updated from SVN and wrote the unit test. Runs withtout errors (Tracebacks etc.) now, however there may be some slight type mismatch issue still: the comparison of objects retrieved from JArray and ArrayList respectively with the objects from the initial JavaList does not match in case of the ArrayList (test yields 4 failures - see below). That's with my newly built JCC2.12/PyLucene3.5 from branch_3x. 

As far as I understand 
 elem0 = jArray[0] -> yields python object
 
 elem0 = arrayList.get(0) -> yields wrapped Java object
 
Not sure if that's intended. In that case the test should be fixed ,-)

Attached is a patch with the test - this time with .txt extension - hope it get's through...

regards
Thomas

======================================================================
FAIL: test_ArrayList (__main__.Test_CollectionsBoolList)
create ArrayList in JVM (from the JavaSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Collections.py", line 208, in test_ArrayList
    elem0,type(elem0), listElem0, type(listElem0)))
AssertionError: should be equal: true (<type 'Object'>) <-> True (<type 'bool'>


======================================================================
FAIL: test_ArrayList (__main__.Test_CollectionsFloatList)
create ArrayList in JVM (from the JavaSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Collections.py", line 208, in test_ArrayList
    elem0,type(elem0), listElem0, type(listElem0)))
AssertionError: should be equal: 1.5 (<type 'Object'>) <-> 1.5 (<type 'float'>)

======================================================================
FAIL: test_ArrayList (__main__.Test_CollectionsListBase)
create ArrayList in JVM (from the JavaSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Collections.py", line 208, in test_ArrayList
    elem0,type(elem0), listElem0, type(listElem0)))
AssertionError: should be equal: 0 (<type 'Object'>) <-> 0 (<type 'int'>)

======================================================================
FAIL: test_ArrayList (__main__.Test_CollectionsStringList)
create ArrayList in JVM (from the JavaSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Collections.py", line 208, in test_ArrayList
    elem0,type(elem0), listElem0, type(listElem0)))
AssertionError: should be equal: a (<type 'Object'>) <-> a (<type 'str'>)

----------------------------------------------------------------------
Ran 42 tests in 0.031s

FAILED (failures=4)

> -----Ursprüngliche Nachricht-----
> Von: Andi Vajda [mailto:vajda@apache.org]
> Gesendet: Donnerstag, 3. Mai 2012 01:00
> An: pylucene-dev@lucene.apache.org
> Betreff: Re: AW: AW: AW: PyLucene use JCC shared object by default
> 
> 
>   Hi Thomas,
> 
> On Wed, 2 May 2012, Thomas Koch wrote:
> 
> > could you download the patch from the link?
> 
> Yes, I got your patch just fine.
> 
> I fixed a few bugs today having to do with converting sequences to JArray
> and added support for auto-boxing primitive types when converting a
> sequence to an object JArray. Now your collections-demo.py all works fine !
> 
> With these fixes the Python toArray() methods can return a Python
> sequence object directly, there is no need to do the JArray conversion in
> Python anymore.
> 
> I simplified the collections.py file a bit to reflect the fixes and all changes,
> including the PythonList/PythonListIterator code is now checked in.
> 
> Could you please convert collections-demo.py into a proper unit test module
> like the unit tests in pylucene/test so that it gets integrated into the test
> suite ?
> 
> Thanks !
> 
> Andi..
> 
> >
> > Just one more thing ... in the initial implementation of PythonList I did the
> toArray() method in Python and the toArray(Object[]) method in Java - just
> as was done for the PythonSet:
> >
> > +    public native List subList(int fromIndex, int toIndex);
> > +    public native Object[] toArray();
> > +
> > +    public Object[] toArray(Object[] a)
> > +    {
> > +        Object[] array = toArray();
> > +
> > +        if (a.length < array.length)
> > +            a = (Object[])
> Array.newInstance(a.getClass().getComponentType(),
> > +                                             array.length);
> > +
> > +        System.arraycopy(array, 0, a, 0, array.length);
> > +
> > +        return a;
> > +    }
> >
> > (from patch of Feb 22nd I sent to you)
> >
> >> From the current patch you can see that the latter part is missing now -
> the toArray(Object[]) method is now done in Python as well, i.e. it simply
> calls toArray():
> >
> >
> ==========================================================
> =========
> > --- java/org/apache/pylucene/util/PythonSet.java	(revision 1332162)
> > +++ java/org/apache/pylucene/util/PythonSet.java	(working copy)
> > @@ -62,14 +62,6 @@
> >
> >     public Object[] toArray(Object[] a)
> >     {
> > -        Object[] array = toArray();
> > -
> > -        if (a.length < array.length)
> > -            a = (Object[]) Array.newInstance(a.getClass().getComponentType(),
> > -                                             array.length);
> > -
> > -        System.arraycopy(array, 0, a, 0, array.length);
> > -
> > -        return a;
> > +        return toArray();
> >     }
> > }
> >
> > As far as I remember that was part of your changes in between (I probably
> never touched PythonSet). Anyway I could imagine that this is related to the
> current problem.
> >
> > However, the ArrayList never calls the 2nd toArray method. The ArrayList
> constructor actually triggers the "simple" toArray method:
> >
> >    /**
> >     * Constructs a list containing the elements of the specified
> >     * collection, in the order they are returned by the collection's
> >     * iterator.
> >     *
> >     * @param c the collection whose elements are to be placed into this list
> >     * @throws NullPointerException if the specified collection is null
> >     */
> >    public ArrayList(Collection<? extends E> c) {
> >        elementData = c.toArray();
> >        size = elementData.length;
> >        // c.toArray might (incorrectly) not return Object[] (see 6260652)
> >        if (elementData.getClass() != Object[].class)
> >            elementData = Arrays.copyOf(elementData, size, Object[].class);
> >    }
> >
> > The ArrayList source code is attached (from OpenJDK6 sources).
> >
> > So maybe that's the wrong path ...  Anyhow I feel the mapping of
> toArray(Object[]) to toArray() does not fully comply with the Java API
> description:
> > http://docs.oracle.com/javase/6/docs/api/java/util/List.html#toArray(T
> > [])
> >
> >
> > Hope that helps...
> >
> >
> > Regards,
> > Thomas
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Andi Vajda [mailto:vajda@apache.org]
> >> Gesendet: Montag, 30. April 2012 19:32
> >> An: pylucene-dev@lucene.apache.org
> >> Betreff: Re: AW: AW: PyLucene use JCC shared object by default
> >>
> >>
> >> On Mon, 30 Apr 2012, Thomas Koch wrote:
> >>
> >>> Dear Andi, I again had a look at the patch I submitted recently and
> >>> would like to get back to it.  An updated version of the patch is
> >>> attached to this email - the patch is against the branch_3x repo
> >>> http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x
> >>
> >> Oh, and there is no attachment in your email. Maybe it got eaten up
> >> by some mail server. Please, make sure it's of a text mimetype or
> >> mail it to me directly.
> >>
> >> Thanks !
> >>
> >> Andi..
> >>
> >>>
> >>> The patch mainly
> >>> - adds two java classes:  PythonList,  PythonListIterator
> >>> - adds according Python classes   (JavaListIterator and JavaList in
> >> collections.py)
> >>>
> >>> Purpose:
> >>> - provide a Java-based List implementation in JCC/PyLucene (similar
> >>> to existing PythonSet/JavaSet)
> >>> - allow to pass python lists via Java Collections into PyLucene
> >>>
> >>> Let's try summarize shortly: PythonSet /JavaSet was already
> >>> existing, but
> >> nothing similar for Lists. I made an implementation of PythonList
> >> /JavaList and with your help this is now basically working. Except of
> >> an open issue that affects both JavaSet and JavaList: initialization
> >> of an ArrayList with a JavaSet (or JavaList) may cause trouble.
> >>>
> >>> As you said: "There is a bug somewhere with constructing an
> >>> ArrayList from
> >> a python collection like JavaSet or JavaList."
> >>>
> >>> I tried to change the toArray() method as you suggested, but that
> >>> didn't
> >> help. As far as I understood, there are two options to box python
> >> values into a typed JArray:
> >>>
> >>> 1)  use the object based JArray class and box python values by
> >>> wrapping
> >> them with the corresponding Java object (e.g. type<int> ->
> lucene.Integer):
> >>>
> >>>>>> x =
> >>>>>> lucene.JArray('object')([lucene.Boolean(True),lucene.Boolean(Fals
> >>>>>> e)
> >>>>>> ])
> >>> JArray<object>[<Object: true>, <Object: false>]
> >>>>>> type(x[0])
> >>> <type 'Object'>
> >>>
> >>> 2)  use the correct array type (int, float, etc.) and pass the list
> >>> of Python
> >> elements or literals) to the JArray constructur, e.g.
> >>>
> >>>>>> y = lucene.JArray('bool')([True,False])
> >>> JArray<bool>[True, False]
> >>>>>> type(y[0])
> >>> <type 'bool'>
> >>>
> >>> I tried both of them (see _pyList2JArray methods in collections.py)
> >>> but
> >> none of them did the trick. Actually the 'empty objects in ArrayList'
> >> problem remains when handling with strings (the ArrayList object that
> >> is initialized with a JavaSet or JavaList of string items will have a
> >> number of objects as the original JavaSet/JavaList, but all objects
> >> are the same - ooks like an array of empty objects). Furthermore another
> issue with integer lists comes into play:
> >> here the initialization of  ArrayList with the Collection fails with
> >> a Java stacktrace (lucene.JavaError: org.apache.jcc.PythonException).
> >>>
> >>> The most simple test case is as follows:
> >>>
> >>> --%< --
> >>> import lucene
> >>> lucene.initVM()
> >>> from lucene.collections import JavaList
> >>>
> >>> # using strings: the ArrayList is created, but initialized with
> >>> empty objects jl = JavaList(['a','b']) al = lucene.ArrayList(jl)
> >>> assert (not al.get(0).equals(al.get(1))), "unique values"
> >>>
> >>> # using ints: the ArrayList is not created,  but an error occurs instead:
> >>> # Java stacktrace: org.apache.jcc.PythonException: ('while calling
> >>> toArray') jl = JavaList(range(3)) al = lucene.ArrayList(jl) --%< --
> >>>
> >>> I currently feel like having to stab around in the dark to find out
> >>> what's going on here and would welcome any suggestions. Needs some
> >> JCC
> >>> expert I guess ,-)
> >>>
> >>> Of course we can leave the patch out - but still there's the same
> >>> issue with
> >> JavaSet.
> >>>
> >>>
> >>> kind regards
> >>>
> >>> Thomas
> >>> --
> >>> OrbiTeam Software GmbH & Co. KG, Germany http://www.orbiteam.de
> >>>
> >>>
> >>>> -----Ursprüngliche Nachricht-----
> >>>> Von: Andi Vajda [mailto:vajda@apache.org]
> >>>> Gesendet: Mittwoch, 18. April 2012 20:37
> >>>> An: pylucene-dev@lucene.apache.org
> >>>> Betreff: Re: AW: PyLucene use JCC shared object by default
> >>>>
> >>>>
> >>>> Hi Thomas,
> >>>> ...
> >>>> Lucene 3.6 just got released a few days ago. Apart from your patch,
> >>>> the PyLucene 3.6 release is ready. I'm about to go offline (email
> >>>> only) for a
> >> week.
> >>>> Let's revisit this patch then (first week of May). It's not
> >>>> blocking the
> >> release
> >>>> right now as, even if I sent out a release candidate for a vote,
> >>>> the three business days required for this would take this into the time
> I'm away.
> >>>> ...
> >>>> Andi..
> >>>
> >>>
> >