You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Bill Janssen <ja...@parc.com> on 2009/02/22 23:19:10 UTC

how to instantiate a Set?

I'm probably missing something incredibly obvious here...

I'm trying to call MoreLikethis.setStopWords(Set words).  I've got a
list of stop words in Python, but I can't figure out how to turn that
into a Java Set.  I tried "lucene.HashSet(set(words)",
"lucene.HashSet(lucene.ArrayList(JArray("string")(words)))", and so
forth, without much luck.

Bill

Re: how to instantiate a Set?

Posted by Andi Vajda <va...@apache.org>.
On Feb 23, 2009, at 8:42, Bill Janssen <ja...@parc.com> wrote:

>>>> a = JavaSet(set(['foo', 'bar', 'baz']))
>
> How about letting me initialize JavaSet with a sequence, too?
>
>>>> a = JavaSet(['foo', 'bar', 'baz'])
>>>>
>

Well, sure, but the point of JavaSet is to expose a set you own and  
control to Java. If you want to just create a set for Java the Arrays  
route works just as well and produces a faster set since its values  
are held in Java.

Andi..

> Bill

Re: how to instantiate a Set?

Posted by Bill Janssen <ja...@parc.com>.
  >>> a = JavaSet(set(['foo', 'bar', 'baz']))

How about letting me initialize JavaSet with a sequence, too?

  >>> a = JavaSet(['foo', 'bar', 'baz'])

Bill


Re: how to instantiate a Set?

Posted by Andi Vajda <va...@apache.org>.
On Sun, 22 Feb 2009, Andi Vajda wrote:

>
> On Sun, 22 Feb 2009, Bill Janssen wrote:
>
>> I'm probably missing something incredibly obvious here...
>> 
>> I'm trying to call MoreLikethis.setStopWords(Set words).  I've got a
>> list of stop words in Python, but I can't figure out how to turn that
>> into a Java Set.  I tried "lucene.HashSet(set(words)",
>> "lucene.HashSet(lucene.ArrayList(JArray("string")(words)))", and so
>> forth, without much luck.
>
> PyLucene doesn't wrap the java.util.Arrays class that fills in the Java gap 
> between arrays and collections. That should be considered an oversight of 
> mine. I should add it to the JCC invocation in PyLucene's Makefile. Then you 
> would be able pass your JArray instance to Arrays toList() method to make an 
> ArrayList and finally feed that to a HashSet.
>
> Another alternative is to implement a Python extension of the Java Set 
> interface. Guess what ? that is already part of PyLucene. The PythonSet class 
> is the extension point for implementing a Java Set in Python and that is part 
> of the PyLucene distribution.
>
> I even have such a Python implementation of a Java Set, called JavaSet.py, 
> ready here but it's not currently shipping with PyLucene, another oversight 
> of mine. I should add it to the distribution.
>
> Until then, here it is below. It takes a python set instance as constructor 
> argument and implements the complete Java Set interface. This example also 
> illustrates a Python implementation of the Java Iterator interface.

I added a collections.py module to the PyLucene distribution.
To use it:
   >>> from lucene.collections import JavaSet
   >>> from lucene import initVM, CLASSPATH
   >>> initVM(CLASSPATH)
   >>> a = JavaSet(set(['foo', 'bar', 'baz']))

I also added some missing proxies for the mapping and sequence protocols so 
that JavaSet can be iterated and used with the 'in' operator from Python.

Andi..

Re: how to instantiate a Set?

Posted by Andi Vajda <va...@apache.org>.
On Sun, 22 Feb 2009, Bill Janssen wrote:

> I'm probably missing something incredibly obvious here...
>
> I'm trying to call MoreLikethis.setStopWords(Set words).  I've got a
> list of stop words in Python, but I can't figure out how to turn that
> into a Java Set.  I tried "lucene.HashSet(set(words)",
> "lucene.HashSet(lucene.ArrayList(JArray("string")(words)))", and so
> forth, without much luck.

PyLucene doesn't wrap the java.util.Arrays class that fills in the Java gap 
between arrays and collections. That should be considered an oversight of 
mine. I should add it to the JCC invocation in PyLucene's Makefile. Then you 
would be able pass your JArray instance to Arrays toList() method to make an 
ArrayList and finally feed that to a HashSet.

Another alternative is to implement a Python extension of the Java Set 
interface. Guess what ? that is already part of PyLucene. The PythonSet 
class is the extension point for implementing a Java Set in Python and that 
is part of the PyLucene distribution.

I even have such a Python implementation of a Java Set, called JavaSet.py, 
ready here but it's not currently shipping with PyLucene, another oversight 
of mine. I should add it to the distribution.

Until then, here it is below. It takes a python set instance as constructor 
argument and implements the complete Java Set interface. This example also 
illustrates a Python implementation of the Java Iterator interface.

Please, let me know if this works for you.
Thanks !

Andi..

----------------------------------------------------------------

from lucene import PythonSet, PythonIterator, JavaError

class JavaSet(PythonSet):

     def __init__(self, _set):
         super(JavaSet, self).__init__()
         self._set = _set

     def add(self, obj):
         if obj not in self._set:
             self._set.add(obj)
             return True
         return False

     def addAll(self, collection):
         size = len(self._set)
         self._set.update(collection)
         return len(self._set) > size

     def clear(self):
         self._set.clear()

     def contains(self, obj):
         return obj in self._set

     def containsAll(self, collection):
         for obj in collection:
             if obj not in self._set:
                 return False
         return True

     def equals(self, collection):
         if type(self) is type(collection):
             return self._set == collection._set
         return False

     def isEmpty(self):
         return len(self._set) == 0

     def iterator(self):
         class _iterator(PythonIterator):
             def __init__(_self):
                 super(_iterator, _self).__init__()
                 _self._iterator = iter(self._set)
             def hasNext(_self):
                 if hasattr(_self, '_next'):
                     return True
                 try:
                     _self._next = _self._iterator.next()
                     return True
                 except StopIteration:
                     return False
             def next(_self):
                 if hasattr(_self, '_next'):
                     next = _self._next
                     del _self._next
                 else:
                     next = _self._iterator.next()
                 return next
         return _iterator()

     def remove(self, obj):
         try:
             self._set.remove(obj)
             return True
         except KeyError:
             return False

     def removeAll(self, collection):
         result = False
         for obj in collection:
             try:
                 self._set.remove(obj)
                 result = True
             except KeyError:
                 pass
         return result

     def retainAll(self, collection):
         result = False
         for obj in list(self._set):
             if obj not in c:
                 self._set.remove(obj)
                 result = True
         return result

     def size(self):
         return len(self._set)

     def toArray(self):
         return list(self._set)


Re: how to instantiate a Set?

Posted by Andi Vajda <va...@apache.org>.
On Sun, 22 Feb 2009, Bill Janssen wrote:

> I'm probably missing something incredibly obvious here...
>
> I'm trying to call MoreLikethis.setStopWords(Set words).  I've got a
> list of stop words in Python, but I can't figure out how to turn that
> into a Java Set.  I tried "lucene.HashSet(set(words)",
> "lucene.HashSet(lucene.ArrayList(JArray("string")(words)))", and so
> forth, without much luck.

I just added the Arrays class to the build by adding java.util.Arrays to the 
jcc invocation in PyLucene's Makefile and now the following just works:

    >>> a=Arrays.asList(JArray('string')(('foo', 'bar', 'baz')))
    >>> a
    <List: [foo, bar, baz]>
    >>> HashSet(a)
    <HashSet: [foo, baz, bar]>

I should have that checked in shortly.

You then get to decide: use the "makes me cringe" Arrays class or the 
pythonic JavaSet.py/PythonSet.java combo :)

Andi..