You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Roxana Danger <ro...@reedonline.co.uk> on 2015/07/10 08:59:58 UTC

accessing to protected elements in PythonTokenizer

Hello,
       I am trying to construct a custom PythonTokenizer (see above), but I
am getting the error: "attribute 'reader' of 'Tokenizer' objects is not
readable" when accessing to it in reset class.
       reader is a protected member in Tokenizer, I was supposing it to be
exposed through PythonTokenizer, and it is passed to the super class in the
constructor. Am I wrong?
       Thanks, best regards,
             Roxana

class ComposerTokenizer(PythonTokenizer):

     def __init__(self, input):

           PythonTokenizer.__init__(self, input)

           self.reset()



     def incrementToken(self):

          if self.index < len(self.finaltokens):

                self.clearAttributes()

                offsetAttr = OffsetAttributeImpl()

                offsetAttr.setOffset( ... )

                self.index = self.index + 1

                return True

            else:

                 return False


       def reset(self):

             s = ''

             ch = self.reader.read()

             while ch <> -1:

                   s = s + ch

                   ch = self.reader.read()

             self.index = 0

             self.finalTokens = ... #processing s to extract
self.finaltokens









<http://www.reed.co.uk/lovemondays>

Re: accessing to protected elements in PythonTokenizer

Posted by Roxana Danger <ro...@reedonline.co.uk>.
Hi Andi,
    Thank you very much. I will use the first solution.
    Best regards.
         Roxana

On 10 July 2015 at 12:00, Andi Vajda <va...@apache.org> wrote:

>
> On Fri, 10 Jul 2015, Roxana Danger wrote:
>
>  Hello,
>>       I am trying to construct a custom PythonTokenizer (see above), but I
>> am getting the error: "attribute 'reader' of 'Tokenizer' objects is not
>> readable" when accessing to it in reset class.
>>       reader is a protected member in Tokenizer, I was supposing it to be
>> exposed through PythonTokenizer, and it is passed to the super class in
>> the
>> constructor. Am I wrong?
>>
>
> You're right but there is no accessor for the reader object stored on the
> Java side that makes it usable from the Python side.
> You can either:
>   - add a getReader() method to the PythonTokenizer Java class that returns
>     it (and rebuild PyLucene after 'make clean')
>   - store the 'input' variable that is passed to your constructor on the
>     Python side, on your ComposerTokenizer instance. That 'input' is the
>     reader (at least, it's passed on to the Tokenizer Java class)
>
> The first option is probably safer as it doesn't assume that
> Tokenizer(reader) is not changing it in some way before storing it.
>
> Andi..
>
>        Thanks, best regards,
>>             Roxana
>>
>> class ComposerTokenizer(PythonTokenizer):
>>
>>     def __init__(self, input):
>>
>>           PythonTokenizer.__init__(self, input)
>>
>>           self.reset()
>>
>>
>>
>>     def incrementToken(self):
>>
>>          if self.index < len(self.finaltokens):
>>
>>                self.clearAttributes()
>>
>>                offsetAttr = OffsetAttributeImpl()
>>
>>                offsetAttr.setOffset( ... )
>>
>>                self.index = self.index + 1
>>
>>                return True
>>
>>            else:
>>
>>                 return False
>>
>>
>>       def reset(self):
>>
>>             s = ''
>>
>>             ch = self.reader.read()
>>
>>             while ch <> -1:
>>
>>                   s = s + ch
>>
>>                   ch = self.reader.read()
>>
>>             self.index = 0
>>
>>             self.finalTokens = ... #processing s to extract
>> self.finaltokens
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> <http://www.reed.co.uk/lovemondays>
>>
>>

Re: accessing to protected elements in PythonTokenizer

Posted by Andi Vajda <va...@apache.org>.
On Fri, 10 Jul 2015, Roxana Danger wrote:

> Hello,
>       I am trying to construct a custom PythonTokenizer (see above), but I
> am getting the error: "attribute 'reader' of 'Tokenizer' objects is not
> readable" when accessing to it in reset class.
>       reader is a protected member in Tokenizer, I was supposing it to be
> exposed through PythonTokenizer, and it is passed to the super class in the
> constructor. Am I wrong?

You're right but there is no accessor for the reader object stored on the 
Java side that makes it usable from the Python side.
You can either:
   - add a getReader() method to the PythonTokenizer Java class that returns
     it (and rebuild PyLucene after 'make clean')
   - store the 'input' variable that is passed to your constructor on the
     Python side, on your ComposerTokenizer instance. That 'input' is the
     reader (at least, it's passed on to the Tokenizer Java class)

The first option is probably safer as it doesn't assume that 
Tokenizer(reader) is not changing it in some way before storing it.

Andi..

>       Thanks, best regards,
>             Roxana
>
> class ComposerTokenizer(PythonTokenizer):
>
>     def __init__(self, input):
>
>           PythonTokenizer.__init__(self, input)
>
>           self.reset()
>
>
>
>     def incrementToken(self):
>
>          if self.index < len(self.finaltokens):
>
>                self.clearAttributes()
>
>                offsetAttr = OffsetAttributeImpl()
>
>                offsetAttr.setOffset( ... )
>
>                self.index = self.index + 1
>
>                return True
>
>            else:
>
>                 return False
>
>
>       def reset(self):
>
>             s = ''
>
>             ch = self.reader.read()
>
>             while ch <> -1:
>
>                   s = s + ch
>
>                   ch = self.reader.read()
>
>             self.index = 0
>
>             self.finalTokens = ... #processing s to extract
> self.finaltokens
>
>
>
>
>
>
>
>
>
> <http://www.reed.co.uk/lovemondays>
>