You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/11/22 18:20:40 UTC

[jira] [Created] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Add PythonReusableAnalyzerBase, so we can create analyzers in Python
--------------------------------------------------------------------

                 Key: PYLUCENE-12
                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
             Project: PyLucene
          Issue Type: Improvement
            Reporter: Michael McCandless


Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.

I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Andi Vajda (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161404#comment-13161404 ] 

Andi Vajda commented on PYLUCENE-12:
------------------------------------

 you say: "I know we document that you must call super (http://lucene.apache.org/pylucene/jcc/documentation/readme.html#extensions), but, can we make this throw an exception instead of SEGV, to be more friendly? Or is that hard...? "

It's not hard, just costly. Everywhere the wrapped pointer is used, it must be checked. It's like checking for lack of calling initVM() or
attachCurrentThread(). It took a while to find the right way to do this that didn't involve checking these all the time.

                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157322#comment-13157322 ] 

Michael McCandless commented on PYLUCENE-12:
--------------------------------------------

One small fix to the patch: we also must add this:

    @Override
    public native Reader initReader(Reader reader);

So that the Python defined analyzer can provide a CharReader/Filter as well.
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Sun, Dec 4, 2011 at 10:50 AM, Andi Vajda <va...@apache.org> wrote:

>> Just to be certain: how can I validate I truly succeeded in shared
>> linking for the lucene extension...?  I'm on linux... when I run "nm"
>> on the _lucene.so, what should I look for to confirm I "succeeded"...?
>
> Use ldd (with the right flag) on _lucene.so, it should depend on libjcc.so if built shared.

Hmm indeed I did link shared.

OK!  My bad... I had failed to "make clean" last time.  Once I did
that, I now see the exception details.  So it looks like not linking
shared was my problem.

Thanks Andi!

Mike McCandless

http://blog.mikemccandless.com

Re: [jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by Andi Vajda <va...@apache.org>.
On Dec 4, 2011, at 6:52, Michael McCandless <lu...@mikemccandless.com> wrote:

> On Sun, Dec 4, 2011 at 9:25 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> On Sat, Dec 3, 2011 at 5:10 PM, Andi Vajda <va...@apache.org> wrote:
>>> 
>>> On Fri, 2 Dec 2011, Michael McCandless (Commented) (JIRA) wrote:
>>> 
>>>> RE the exception inside createComponents... strange!  Your exception
>>>> indeed has all the details (ie, shows the original traceback, from the
>>>> createComponents method).
>>>> 
>>>> Yet, when I do exactly that change (stick the x in, then run the test case
>>>> directly, I get this:
>>> 
>>> Did you build your lucene module with --shared (and did you build jcc with
>>> shared enabled, the default normally). It occurred to me that exception
>>> reporting is a bit weaker in non shared mode because the PythonException
>>> java class is not present. Just a thought...
>> 
>> Hmm, I believe I built jcc with the defaults (shared), but indeed I
>> did not build the lucene extension shared... I'll try to build shared
>> and see if that fixes the exception reporting!  If so, maybe we should
>> note this limitation of non-shared...
> 
> Hmm I went and built the lucene extension shared (added --shared to
> the command-line passed to jcc module, in the topelevel Makefile) but
> I still don't get the traceback inside Python... spooky.
> 
> Just to be certain: how can I validate I truly succeeded in shared
> linking for the lucene extension...?  I'm on linux... when I run "nm"
> on the _lucene.so, what should I look for to confirm I "succeeded"...?

Use ldd (with the right flag) on _lucene.so, it should depend on libjcc.so if built shared.

Andi..

> 
> Mike McCandless
> 
> http://blog.mikemccandless.com

Re: [jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Sun, Dec 4, 2011 at 9:25 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Sat, Dec 3, 2011 at 5:10 PM, Andi Vajda <va...@apache.org> wrote:
>>
>> On Fri, 2 Dec 2011, Michael McCandless (Commented) (JIRA) wrote:
>>
>>> RE the exception inside createComponents... strange!  Your exception
>>> indeed has all the details (ie, shows the original traceback, from the
>>> createComponents method).
>>>
>>> Yet, when I do exactly that change (stick the x in, then run the test case
>>> directly, I get this:
>>
>> Did you build your lucene module with --shared (and did you build jcc with
>> shared enabled, the default normally). It occurred to me that exception
>> reporting is a bit weaker in non shared mode because the PythonException
>> java class is not present. Just a thought...
>
> Hmm, I believe I built jcc with the defaults (shared), but indeed I
> did not build the lucene extension shared... I'll try to build shared
> and see if that fixes the exception reporting!  If so, maybe we should
> note this limitation of non-shared...

Hmm I went and built the lucene extension shared (added --shared to
the command-line passed to jcc module, in the topelevel Makefile) but
I still don't get the traceback inside Python... spooky.

Just to be certain: how can I validate I truly succeeded in shared
linking for the lucene extension...?  I'm on linux... when I run "nm"
on the _lucene.so, what should I look for to confirm I "succeeded"...?

Mike McCandless

http://blog.mikemccandless.com

Re: [jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Sat, Dec 3, 2011 at 5:10 PM, Andi Vajda <va...@apache.org> wrote:
>
> On Fri, 2 Dec 2011, Michael McCandless (Commented) (JIRA) wrote:
>
>> RE the exception inside createComponents... strange!  Your exception
>> indeed has all the details (ie, shows the original traceback, from the
>> createComponents method).
>>
>> Yet, when I do exactly that change (stick the x in, then run the test case
>> directly, I get this:
>
> Did you build your lucene module with --shared (and did you build jcc with
> shared enabled, the default normally). It occurred to me that exception
> reporting is a bit weaker in non shared mode because the PythonException
> java class is not present. Just a thought...

Hmm, I believe I built jcc with the defaults (shared), but indeed I
did not build the lucene extension shared... I'll try to build shared
and see if that fixes the exception reporting!  If so, maybe we should
note this limitation of non-shared...

Mike McCandless

http://blog.mikemccandless.com

Re: [jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by Andi Vajda <va...@apache.org>.
On Fri, 2 Dec 2011, Michael McCandless (Commented) (JIRA) wrote:

> RE the exception inside createComponents... strange!  Your exception 
> indeed has all the details (ie, shows the original traceback, from the 
> createComponents method).
>
> Yet, when I do exactly that change (stick the x in, then run the test case 
> directly, I get this:

Did you build your lucene module with --shared (and did you build jcc with 
shared enabled, the default normally). It occurred to me that exception 
reporting is a bit weaker in non shared mode because the PythonException 
java class is not present. Just a thought...

Andi..

>
>
> ======================================================================
> ERROR: testReusable (__main__.ReusableAnalyzerBaseTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "test/test_ReusableAnalyzerBase.py", line 36, in testReusable
>    stream = method("test", reader)
> JavaError: java.lang.RuntimeException: NameError
>    Java stacktrace:
> java.lang.RuntimeException: NameError
> 	at org.apache.pylucene.analysis.PythonReusableAnalyzerBase.createComponents(Native Method)
> 	at org.apache.lucene.analysis.ReusableAnalyzerBase.reusableTokenStream(ReusableAnalyzerBase.java:73)
>
>
> Ie, for some reason, I don't get the traceback from the createComponents method; all I see is that a NameError had happened, not what name in particular, and what lines of Python source.
>
> I'm on Linux, Python 64 bit, Java 1.6.0_21... I wonder if I somehow compiled things incorrectly?  Odd.
>
>> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
>> --------------------------------------------------------------------
>>
>>                 Key: PYLUCENE-12
>>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>>             Project: PyLucene
>>          Issue Type: Improvement
>>            Reporter: Michael McCandless
>>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>>
>>
>> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
>> I think we should expose it in Python... patch is simple.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

[jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161718#comment-13161718 ] 

Michael McCandless commented on PYLUCENE-12:
--------------------------------------------

RE the exception inside createComponents... strange!  Your exception indeed has all the details (ie, shows the original traceback, from the createComponents method).

Yet, when I do exactly that change (stick the x in, then run the test case directly, I get this:


======================================================================
ERROR: testReusable (__main__.ReusableAnalyzerBaseTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_ReusableAnalyzerBase.py", line 36, in testReusable
    stream = method("test", reader)
JavaError: java.lang.RuntimeException: NameError
    Java stacktrace:
java.lang.RuntimeException: NameError
	at org.apache.pylucene.analysis.PythonReusableAnalyzerBase.createComponents(Native Method)
	at org.apache.lucene.analysis.ReusableAnalyzerBase.reusableTokenStream(ReusableAnalyzerBase.java:73)


Ie, for some reason, I don't get the traceback from the createComponents method; all I see is that a NameError had happened, not what name in particular, and what lines of Python source.

I'm on Linux, Python 64 bit, Java 1.6.0_21... I wonder if I somehow compiled things incorrectly?  Odd.
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated PYLUCENE-12:
---------------------------------------

    Attachment: PYLUCENE-12.patch

New patch (just fixes my indentation screwup from last one).
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Andi Vajda (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161406#comment-13161406 ] 

Andi Vajda commented on PYLUCENE-12:
------------------------------------

About the lack of information in the stacktrace, I added a random x into the createComponents method and I'm getting this:

{noformat}
Traceback (most recent call last):
  File "test/test_ReusableAnalyzerBase.py", line 36, in testReusable
    stream = method("test", reader)
JavaError: org.apache.jcc.PythonException: global name 'xfirst' is not defined
Traceback (most recent call last):
  File "test/test_ReusableAnalyzerBase.py", line 24, in createComponents
    last = StopFilter(Version.LUCENE_CURRENT, xfirst, StopAnalyzer.ENGLISH_STOP_WORDS_SET)
NameError: global name 'xfirst' is not defined

    Java stacktrace:
org.apache.jcc.PythonException: global name 'xfirst' is not defined
Traceback (most recent call last):
  File "test/test_ReusableAnalyzerBase.py", line 24, in createComponents
    last = StopFilter(Version.LUCENE_CURRENT, xfirst, StopAnalyzer.ENGLISH_STOP_WORDS_SET)
NameError: global name 'xfirst' is not defined

	at org.apache.pylucene.analysis.PythonReusableAnalyzerBase.createComponents(Native Method)
	at org.apache.lucene.analysis.ReusableAnalyzerBase.reusableTokenStream(ReusableAnalyzerBase.java:73)
{noformat}

Seems plenty of detail to me. What do you think is missing ?

                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155389#comment-13155389 ] 

Michael McCandless commented on PYLUCENE-12:
--------------------------------------------

I noticed one unfriendliness here: if I modify the MyAnalyzer class (in test_ReusableAnalyzerBase.py), adding an empty ctor (def __init__) that fails to call super's __init__, then I get a SEGV.

I know we document that you must call super (http://lucene.apache.org/pylucene/jcc/documentation/readme.html#extensions), but, can we make this throw an exception instead of SEGV, to be more friendly?  Or is that hard...?
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155397#comment-13155397 ] 

Michael McCandless commented on PYLUCENE-12:
--------------------------------------------

Hmm, one more unfriendliness: if the createComponents method throws an exception (eg put xxx in there so you hit a NameError), you get back an exception like this:

{noformat}
======================================================================
ERROR: testReusable (__main__.ReusableAnalyzerBaseTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_ReusableAnalyzerBase.py", line 37, in testReusable
    stream = method("test", reader)
JavaError: java.lang.RuntimeException: NameError
    Java stacktrace:
java.lang.RuntimeException: NameError
	at org.apache.pylucene.analysis.PythonReusableAnalyzerBase.createComponents(Native Method)
	at org.apache.lucene.analysis.ReusableAnalyzerBase.reusableTokenStream(ReusableAnalyzerBase.java:73)
{noformat}

Somehow this is missing details (exception cause & TB) of the python source that caused the exception.... can we fix this?  If it's tricky I can open a new issue...
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by Andi Vajda <va...@apache.org>.
On Fri, 2 Dec 2011, Michael McCandless (Commented) (JIRA) wrote:

> Sorry, could you also add this method to PythonReusableAnalyzerBase.java (I missed it in my first patch):
>
>    @Override
>    public native Reader initReader(Reader reader);

Done in rev 1209756.

> Separately: how do we turn on Jira's markup like {noformat} and comment previews here ;)

I have no idea.
I find JIRA mail irritating anyway (it takes a whole iPhone screen to 
display nothing of use, like a 5 line long useless URL, for example).

Andi..

>
>> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
>> --------------------------------------------------------------------
>>
>>                 Key: PYLUCENE-12
>>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>>             Project: PyLucene
>>          Issue Type: Improvement
>>            Reporter: Michael McCandless
>>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>>
>>
>> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
>> I think we should expose it in Python... patch is simple.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

[jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161715#comment-13161715 ] 

Michael McCandless commented on PYLUCENE-12:
--------------------------------------------

Sorry, could you also add this method to PythonReusableAnalyzerBase.java (I missed it in my first patch):

    @Override 
    public native Reader initReader(Reader reader); 

Separately: how do we turn on Jira's markup like {noformat} and comment previews here ;)
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161716#comment-13161716 ] 

Michael McCandless commented on PYLUCENE-12:
--------------------------------------------

Re not SEGVing if you fail to call super ... OK, if we can't find a non-costly way to do it, let's not!
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Andi Vajda (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andi Vajda resolved PYLUCENE-12.
--------------------------------

    Resolution: Fixed

rev 1209356, thanks Mike !
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch, PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PYLUCENE-12) Add PythonReusableAnalyzerBase, so we can create analyzers in Python

Posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PYLUCENE-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated PYLUCENE-12:
---------------------------------------

    Attachment: PYLUCENE-12.patch

Patch w/ basic test.
                
> Add PythonReusableAnalyzerBase, so we can create analyzers in Python
> --------------------------------------------------------------------
>
>                 Key: PYLUCENE-12
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-12
>             Project: PyLucene
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: PYLUCENE-12.patch
>
>
> Lucene now has a useful helper class, ReusableAnalyzerBase; you subclass it and override one method, to create an analyzer that provides reusableTokenStream impl.
> I think we should expose it in Python... patch is simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira