You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Thomas Koch <ko...@orbiteam.de> on 2012/03/02 13:49:00 UTC

AW: AW: AW: Setting Stopword Set in PyLucene (or using Set in general)

Hi Andi,
thanks for the feedback! I revised the code and send you attached a new
patch.

I also attach a short demo script that shows the problems I mentioned
earlier when trying to initialize an ArrayList with a JavaSet (or JavaList)
containing integers.

Finally I'd suggest to rename collections.py because there's one defined on
Python lib already:
http://docs.python.org/library/collections.html

Below are some comments to your comments...

Regards,
Thomas

> -----Ursprüngliche Nachricht-----
> Von: Andi Vajda [mailto:vajda@apache.org]
> Gesendet: Sonntag, 26. Februar 2012 23:29
> An: pylucene-dev@lucene.apache.org
> Betreff: Re: AW: AW: Setting Stopword Set in PyLucene (or using Set in
general)
> 
> ...
> According to the javadocs, this method is supposed to throw
NoSuchElementException. Raising StopIteration is not going to do the trick.
> Same comment on the previous method too.

Ok, I was unsure on how to properly throw a Java Exception in Python code -
and couldn't find an example. 
Also I thought a Java Exception type should be exported in lucene - this is
not the case however:
>>> lucene.NoSuchElementException
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'NoSuchElementException'

I imagine I could
- add the java.util.NoSuchElementException to the Makefile to get it
generated by JCC and throw it via raise?
- use lucene.JavaError and pass  'java.util.NoSuchElementException' name in
the constructor?
- extend / use PythonException?
- defined Python Exception class
NoSuchElementException(exceptions.Exception) and raise that one?
- raise RuntimeError, 'NoSuchElementException' and raise that one?
- define some helper methods for 'native' Java Exceptions in PythonList.java
and call 'em

Which one does the trick? Unless I know better I go with the last one...

(I understand PythonException is used by JCC to wrap errors that escape from
Python to Java and JavaError is used by JCC for Java Exceptions that escape
from Java to Python - but how do you 'fake' a Java Exception within Python?)

Same problem for IndexOutOfBoundsException in get()

> Why not also implement remove() and set() ?

Because they are optional ... I've implemented them now.

+    def lastIndexOf(obj):
> Wouldn't it be more efficient to iterate backwards until the element is
found instead of copying the list (self._lst[::-1]) and iterate forwards ?
Done.

+    def remove(self, obj_or_index):
+        if type(obj_or_index) is type(1):
+            return removeAt(int(obj_or_index))
+        return removeElement(obj_or_index)

> It's better to do this at the Java level. 
> Declare differently named native methods for each overload of remove() and
implement remove(int) in Java to call removeInt(int) and remove(Object) to
call removeObject(Object

Done. The different methods are declared private now.

+
+    def subList(fromIndex, toIndex):
+        sublst = self._lst[fromIndex:toIndex]
+        return JavaList(sublst)

> The javadoc expects this method to throws IndexOutOfBoundsException
instead of behaving nice like a Python slice.

This check (and Exception handling) is done on java-level now.


+public class PythonListIterator extends PythonIterator implements
ListIterator {
+
+    // private long pythonObject;
+
+    public PythonListIterator()
+    {
+    }
+ 
+    /* defined in super class PythonIterator:
...
> If this work, you don't need pythonObject to be protected anymore in the
superclass then ?

True - just reverted the changes in PythonIterator. 



AW: AW: AW: AW: Setting Stopword Set in PyLucene (or using Set in general)

Posted by Thomas Koch <ko...@orbiteam.de>.
Hi,
I have to add a comment to my previous mail:

> I'd preferred using this option (#2) in toArray (for both JavaList and
> JavaSet) as it does not require the wrapping into  Java Integer (etc.)
objects.
> However this method does not work with lucene.ArrayList:
> 
>  >> x=lucene. JArray ('int')([1,2])
>  JArray<int>[1, 2]
>  >>> y=lucene. ArrayList (x)
>  Traceback: lucene.InvalidArgsError:
>   (<type 'ArrayList'>, '__init__', (JArray<int>[1, 2],))
> 
Sorry - that's rubbish of course: ArrayList requires a collection in its
constructor and JArray isn't a collection. So this can't work! The
'challenge' was to be able to use JavaSet and/or JavaList (both are
collections) as an argument for ArrayList. (During init of ArrayList the
toArray() method is called however.)

So I gave it a quick try again, and tried the 2nd alternative:

> 1) return JArray(object)([<lucene.Integer()-object>*])
> or
> 2) return JArray(int)([<Python-int-literal>*])

but that option then fails (in the demo code) when using bool (or float)
types. Attached is a revised version of collections.py with the alternative
code (disabled) - if anyone's interested...

The mentioned issue with the created JArray containing the same objects
still remains. I'll have to look deeper into that, but as said I'm out of
office next week ...

BTW, sorry if this is out of scope of the PyLucene mailing list (it's more a
JCC related discussion) - we can continue with 'private' mail if that's
preferred. 

Regards,
Thomas

AW: AW: AW: AW: Setting Stopword Set in PyLucene (or using Set in general)

Posted by Thomas Koch <ko...@orbiteam.de>.
Hi Andi,
thanks for your feedback and for the code cleanup.

Regarding the 'toArray'-issue I tried different versions of JArray
'typed-constructor' and it turned out that these two alternatives basically
work:
(example for int types)

1) return JArray(object)([<lucene.Integer()-object>*])
or
2) return JArray(int)([<Python-int-literal>*])

Even surprising (for me): there are different ways to construct those
template-types using string or type:

 >> lucene.JArray(int)([1,2])
 and
 >> lucene.JArray('int')([1,2]) 
 both create the same type  <type 'JArray_int'>

I'd preferred using this option (#2) in toArray (for both JavaList and
JavaSet) as it does not require the wrapping into  Java Integer (etc.)
objects. However this method does not work with lucene.ArrayList:

 >> x=lucene.JArray('int')([1,2])
 JArray<int>[1, 2]
 >>> y=lucene.ArrayList(x)
 Traceback: lucene.InvalidArgsError:
  (<type 'ArrayList'>, '__init__', (JArray<int>[1, 2],))

So I decided to choose the Java-object wrapper option (#1) and implemented
toArray for primitive types (int,float,long,bool). It turned out that
wrapping strings is not needed.  That way the collections-demo runs fine and
I can init a lucene.ArrayList with the JavaSet or JavaList for the mentioned
types.

Attached is a revised version of collections.py and collections-demo.py
(which should run without error now).

However there's still one question/issue as you can see from the output of
collections-demo.py (and some commented 'test code' in collections-demo.py):

created JArray: JArray<object>[<Object: 0>, <Object: 1>, <Object: 2>,
<Object: 3>, ...] <type 'JArray_object'>
created ArrayList: [java.lang.Object@785d65, java.lang.Object@785d65,
java.lang.Object@785d65, java.lang.Object@785d65....,] <type 'ArrayList'>

It looks as if the objects passed in from JavaSet to lucene.ArrayList end up
in the same object (that's also why indexOf behaves somewhat strange). Could
be a bug in my test code, but this is no problem for lucene.HashSet(JavaSet)
for example so I'm really curious what's going on here...

If you have any ideas, pls let me know. Will also look into it again if I
got some time but shall be busy for most of the week and out of office next
week.

regards,
Thomas

-----Ursprüngliche Nachricht-----
Von: Andi Vajda [mailto:vajda@apache.org] 
Gesendet: Montag, 12. März 2012 03:34
An: pylucene-dev@lucene.apache.org
Cc: Thomas Koch
Betreff: Re: AW: AW: AW: Setting Stopword Set in PyLucene (or using Set in
general)


  Hi Thomas,

On Fri, 2 Mar 2012, Thomas Koch wrote:

> thanks for the feedback! I revised the code and send you attached a 
> new patch.

Sorry for the delay in getting back to you.

I integrated your patch and fixed a bunch of formatting and bugs in it.
The collections-demo.py is not fully functional yet so I attach it here too,
somewhat fixed up as well.

There is a bug somewhere with constructing an ArrayList from a python
collection like JavaSet or JavaList. At some point, toArray() gets called,
the right aray is returned (almost, see below) but the ArrayList looks like
built from an array of empty objects.

> I also attach a short demo script that shows the problems I mentioned 
> earlier when trying to initialize an ArrayList with a JavaSet (or 
> JavaList) containing integers.

For that the toArray() methods in collections.py must create use the correct
array type using int, float, etc... instead of object based on what's in the
python object.
Alternatively, they need these methods need to box the int values by
wrapping them into a Java Integer object (for example, lucene.Integer(5)).
I leave that to you to continue with, I'm out of time for right now :-)

> Finally I'd suggest to rename collections.py because there's one 
> defined on Python lib already:
> http://docs.python.org/library/collections.html

Until this happens, you can use:
  from lucene import collections
as the collections.py file gets installed in the lucene package.

Throwing Java exceptions from Python is done by raising JavaError with the
desired Java exception object (I added a few to the jcc call in PyLucene's
Makefile), for example:
   raise JavaError, NoSuchElementException(str(index))

It's been like that for a very long time, I just forgot.
This is implemented by throwPythonError() in jcc's functions.cpp: if the
error is JavaError, then the Java exception instance used as argument to it
is raised to the JVM.

I attached the not-checked-in diffs as patches. The new Makefile is checked
into the pylucene-3.x branch.

> Below are some comments to your comments...

More responses inline below.

> Ok, I was unsure on how to properly throw a Java Exception in Python 
> code - and couldn't find an example.
> Also I thought a Java Exception type should be exported in lucene - 
> this is not the case however:
>>>> lucene.NoSuchElementException
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'NoSuchElementException'
>
> I imagine I could
> - add the java.util.NoSuchElementException to the Makefile to get it 
> generated by JCC and throw it via raise?
> - use lucene.JavaError and pass  'java.util.NoSuchElementException' 
> name in the constructor?

Yes, you guessed it right, this is how it works as outlined above.

You had various bugs in next()/nextIndex(), previous()/previousIndex() that
I hopefully fixed. Also, listIterator() can't be overridden in Python, I
fixed it in PythonList and in collections.py.

Andi..

Re: AW: AW: AW: Setting Stopword Set in PyLucene (or using Set in general)

Posted by Andi Vajda <va...@apache.org>.
  Hi Thomas,

On Fri, 2 Mar 2012, Thomas Koch wrote:

> thanks for the feedback! I revised the code and send you attached a new
> patch.

Sorry for the delay in getting back to you.

I integrated your patch and fixed a bunch of formatting and bugs in it.
The collections-demo.py is not fully functional yet so I attach it here too, 
somewhat fixed up as well.

There is a bug somewhere with constructing an ArrayList from a python 
collection like JavaSet or JavaList. At some point, toArray() gets called, 
the right aray is returned (almost, see below) but the ArrayList looks like 
built from an array of empty objects.

> I also attach a short demo script that shows the problems I mentioned
> earlier when trying to initialize an ArrayList with a JavaSet (or JavaList)
> containing integers.

For that the toArray() methods in collections.py must create use the correct
array type using int, float, etc... instead of object based on what's in the 
python object.
Alternatively, they need these methods need to box the int values by 
wrapping them into a Java Integer object (for example, lucene.Integer(5)).
I leave that to you to continue with, I'm out of time for right now :-)

> Finally I'd suggest to rename collections.py because there's one defined on
> Python lib already:
> http://docs.python.org/library/collections.html

Until this happens, you can use:
  from lucene import collections
as the collections.py file gets installed in the lucene package.

Throwing Java exceptions from Python is done by raising JavaError with the 
desired Java exception object (I added a few to the jcc call in PyLucene's 
Makefile), for example:
   raise JavaError, NoSuchElementException(str(index))

It's been like that for a very long time, I just forgot.
This is implemented by throwPythonError() in jcc's functions.cpp: if the 
error is JavaError, then the Java exception instance used as argument to it 
is raised to the JVM.

I attached the not-checked-in diffs as patches. The new Makefile is checked 
into the pylucene-3.x branch.

> Below are some comments to your comments...

More responses inline below.

> Ok, I was unsure on how to properly throw a Java Exception in Python code -
> and couldn't find an example.
> Also I thought a Java Exception type should be exported in lucene - this is
> not the case however:
>>>> lucene.NoSuchElementException
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'NoSuchElementException'
>
> I imagine I could
> - add the java.util.NoSuchElementException to the Makefile to get it
> generated by JCC and throw it via raise?
> - use lucene.JavaError and pass  'java.util.NoSuchElementException' name in
> the constructor?

Yes, you guessed it right, this is how it works as outlined above.

You had various bugs in next()/nextIndex(), previous()/previousIndex() that 
I hopefully fixed. Also, listIterator() can't be overridden in Python, I 
fixed it in PythonList and in collections.py.

Andi..