You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by marco turchi <ma...@gmail.com> on 2017/01/28 22:06:19 UTC

ShingleAnalyzerWrapper in PyLucene

Dear All,
I need to use the ShingleAnalyzerWrapper in PyLucene.

I have built the analyzer similar to Lucene:
self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, " " ,
True, False, None)

and I have used it inside QuertParser
query = QueryParser("source", self.analyzer).parse("welcome world is at on")

the output is:
source:welcome source:world source:is source:at source:on

I have run the same code in Java and the output is how I would expect it:
source:welcome source:welcome world source:welcome world is source:welcome
world is at source:world source:world is source:world is at source:world is
at on source:is content:is at source:is at on source:at source:at on
source:on

Do you have any ideas in what I'm doing wrong in PyLucene?

Thanks a lot in advance for your help
Marco

Re: ShingleAnalyzerWrapper in PyLucene

Posted by Andi Vajda <va...@apache.org>.
> On Jan 29, 2017, at 10:24, marco turchi <ma...@gmail.com> wrote:
> 
> It is strange because I can see the attached files in the email I sent you... 
> 
> I attach again the Java code. In case it is not attached again, you can download from this link:
> https://www.dropbox.com/s/o7ocygrdv8dqksl/CopyOfTest.java?dl=0
> the file is called CopyOfTest.Java

Indeed. No attachment was received here. Probably some security feature somewhere. The link you included should be good enough.

Thanks !

Andi..

> 
> Thanks a lot!
> Marco
> 
> 
> 
>> On Sun, Jan 29, 2017 at 7:14 PM, Andi Vajda <va...@apache.org> wrote:
>> 
>> > On Jan 29, 2017, at 03:50, marco turchi <ma...@gmail.com> wrote:
>> >
>> > Dear Andi,
>> > please find in attachment the Java and the Python codes. Both of them, create an index with two records using Shingle analyser and then query it printing the query and the terms of the query.
>> 
>> It looks like you attached only the python program, only one attachment.
>> 
>> Andi..
>> 
>> >
>> > Thanks a lot for your help
>> > Marco
>> >
>> >
>> >
>> >> On Sun, Jan 29, 2017 at 3:10 AM, Andi Vajda <va...@apache.org> wrote:
>> >>
>> >> On Sat, 28 Jan 2017, marco turchi wrote:
>> >>
>> >>> Dear All,
>> >>> I need to use the ShingleAnalyzerWrapper in PyLucene.
>> >>>
>> >>> I have built the analyzer similar to Lucene:
>> >>> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, " " ,
>> >>> True, False, None)
>> >>>
>> >>> and I have used it inside QuertParser
>> >>> query = QueryParser("source", self.analyzer).parse("welcome world is at on")
>> >>>
>> >>> the output is:
>> >>> source:welcome source:world source:is source:at source:on
>> >>>
>> >>> I have run the same code in Java and the output is how I would expect it:
>> >>> source:welcome source:welcome world source:welcome world is source:welcome
>> >>> world is at source:world source:world is source:world is at source:world is
>> >>> at on source:is content:is at source:is at on source:at source:at on
>> >>> source:on
>> >>>
>> >>> Do you have any ideas in what I'm doing wrong in PyLucene?
>> >>
>> >> Please, help me help you by including two simple programs that I can run to reproduce the problem. One in Java producing the output you expect, one in Python producing the output you're reporting.
>> >>
>> >> Thanks !
>> >>
>> >> Andi..
>> >>
>> >>
>> >>>
>> >>> Thanks a lot in advance for your help
>> >>> Marco
>> >>>
>> >
>> > <TestShingle.py>
> 

Re: ShingleAnalyzerWrapper in PyLucene

Posted by marco turchi <ma...@gmail.com>.
Hi Andi,
while I was changing the parameter value, I have noticed another problem. I
have fixed it and it works.

Thanks a lot and sorry for bothering you!
Marco

On Sun, Jan 29, 2017 at 9:38 PM, Andi Vajda <va...@apache.org> wrote:

>
> On Sun, 29 Jan 2017, marco turchi wrote:
>
> It is strange because I can see the attached files in the email I sent
>> you...
>>
>> I attach again the Java code. In case it is not attached again, you can
>> download from this link:
>> https://www.dropbox.com/s/o7ocygrdv8dqksl/CopyOfTest.java?dl=0
>> the file is called CopyOfTest.Java
>>
>
> I didn't try to run your programs yet but one source of difference noticed
> is that in Python you do:
>   analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 6, ' ',
> True, False, None)
> and in Java you do:
>   analyzer = new ShingleAnalyzerWrapper(new WhitespaceAnalyzer(), 2, 4, "
> ", true, false, null);
>
> The numeric parameters are not the same: 2, 6 vs 2, 4.
> Please use the same values in both versions and let us know if that solves
> the problem.
>
> Thanks !
>
> Andi..
>
>
>> Thanks a lot!
>> Marco
>>
>>
>>
>> On Sun, Jan 29, 2017 at 7:14 PM, Andi Vajda <va...@apache.org> wrote:
>>
>>
>>> On Jan 29, 2017, at 03:50, marco turchi <ma...@gmail.com> wrote:
>>>>
>>>> Dear Andi,
>>>> please find in attachment the Java and the Python codes. Both of them,
>>>>
>>> create an index with two records using Shingle analyser and then query it
>>> printing the query and the terms of the query.
>>>
>>> It looks like you attached only the python program, only one attachment.
>>>
>>> Andi..
>>>
>>>
>>>> Thanks a lot for your help
>>>> Marco
>>>>
>>>>
>>>>
>>>> On Sun, Jan 29, 2017 at 3:10 AM, Andi Vajda <va...@apache.org> wrote:
>>>>>
>>>>> On Sat, 28 Jan 2017, marco turchi wrote:
>>>>>
>>>>> Dear All,
>>>>>> I need to use the ShingleAnalyzerWrapper in PyLucene.
>>>>>>
>>>>>> I have built the analyzer similar to Lucene:
>>>>>> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, "
>>>>>>
>>>>> " ,
>>>
>>>> True, False, None)
>>>>>>
>>>>>> and I have used it inside QuertParser
>>>>>> query = QueryParser("source", self.analyzer).parse("welcome world is
>>>>>>
>>>>> at on")
>>>
>>>>
>>>>>> the output is:
>>>>>> source:welcome source:world source:is source:at source:on
>>>>>>
>>>>>> I have run the same code in Java and the output is how I would expect
>>>>>>
>>>>> it:
>>>
>>>> source:welcome source:welcome world source:welcome world is
>>>>>>
>>>>> source:welcome
>>>
>>>> world is at source:world source:world is source:world is at
>>>>>>
>>>>> source:world is
>>>
>>>> at on source:is content:is at source:is at on source:at source:at on
>>>>>> source:on
>>>>>>
>>>>>> Do you have any ideas in what I'm doing wrong in PyLucene?
>>>>>>
>>>>>
>>>>> Please, help me help you by including two simple programs that I can
>>>>>
>>>> run to reproduce the problem. One in Java producing the output you
>>> expect,
>>> one in Python producing the output you're reporting.
>>>
>>>>
>>>>> Thanks !
>>>>>
>>>>> Andi..
>>>>>
>>>>>
>>>>>
>>>>>> Thanks a lot in advance for your help
>>>>>> Marco
>>>>>>
>>>>>>
>>>> <TestShingle.py>
>>>>
>>>
>>>
>>

Re: ShingleAnalyzerWrapper in PyLucene

Posted by Andi Vajda <va...@apache.org>.
On Sun, 29 Jan 2017, marco turchi wrote:

> It is strange because I can see the attached files in the email I sent
> you...
>
> I attach again the Java code. In case it is not attached again, you can
> download from this link:
> https://www.dropbox.com/s/o7ocygrdv8dqksl/CopyOfTest.java?dl=0
> the file is called CopyOfTest.Java

I didn't try to run your programs yet but one source of difference noticed 
is that in Python you do:
   analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 6, ' ', True, False, None)
and in Java you do:
   analyzer = new ShingleAnalyzerWrapper(new WhitespaceAnalyzer(), 2, 4, " ", true, false, null);

The numeric parameters are not the same: 2, 6 vs 2, 4.
Please use the same values in both versions and let us know if that solves 
the problem.
Thanks !

Andi..

>
> Thanks a lot!
> Marco
>
>
>
> On Sun, Jan 29, 2017 at 7:14 PM, Andi Vajda <va...@apache.org> wrote:
>
>>
>>> On Jan 29, 2017, at 03:50, marco turchi <ma...@gmail.com> wrote:
>>>
>>> Dear Andi,
>>> please find in attachment the Java and the Python codes. Both of them,
>> create an index with two records using Shingle analyser and then query it
>> printing the query and the terms of the query.
>>
>> It looks like you attached only the python program, only one attachment.
>>
>> Andi..
>>
>>>
>>> Thanks a lot for your help
>>> Marco
>>>
>>>
>>>
>>>> On Sun, Jan 29, 2017 at 3:10 AM, Andi Vajda <va...@apache.org> wrote:
>>>>
>>>> On Sat, 28 Jan 2017, marco turchi wrote:
>>>>
>>>>> Dear All,
>>>>> I need to use the ShingleAnalyzerWrapper in PyLucene.
>>>>>
>>>>> I have built the analyzer similar to Lucene:
>>>>> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, "
>> " ,
>>>>> True, False, None)
>>>>>
>>>>> and I have used it inside QuertParser
>>>>> query = QueryParser("source", self.analyzer).parse("welcome world is
>> at on")
>>>>>
>>>>> the output is:
>>>>> source:welcome source:world source:is source:at source:on
>>>>>
>>>>> I have run the same code in Java and the output is how I would expect
>> it:
>>>>> source:welcome source:welcome world source:welcome world is
>> source:welcome
>>>>> world is at source:world source:world is source:world is at
>> source:world is
>>>>> at on source:is content:is at source:is at on source:at source:at on
>>>>> source:on
>>>>>
>>>>> Do you have any ideas in what I'm doing wrong in PyLucene?
>>>>
>>>> Please, help me help you by including two simple programs that I can
>> run to reproduce the problem. One in Java producing the output you expect,
>> one in Python producing the output you're reporting.
>>>>
>>>> Thanks !
>>>>
>>>> Andi..
>>>>
>>>>
>>>>>
>>>>> Thanks a lot in advance for your help
>>>>> Marco
>>>>>
>>>
>>> <TestShingle.py>
>>
>

Re: ShingleAnalyzerWrapper in PyLucene

Posted by marco turchi <ma...@gmail.com>.
It is strange because I can see the attached files in the email I sent
you...

I attach again the Java code. In case it is not attached again, you can
download from this link:
https://www.dropbox.com/s/o7ocygrdv8dqksl/CopyOfTest.java?dl=0
the file is called CopyOfTest.Java

Thanks a lot!
Marco



On Sun, Jan 29, 2017 at 7:14 PM, Andi Vajda <va...@apache.org> wrote:

>
> > On Jan 29, 2017, at 03:50, marco turchi <ma...@gmail.com> wrote:
> >
> > Dear Andi,
> > please find in attachment the Java and the Python codes. Both of them,
> create an index with two records using Shingle analyser and then query it
> printing the query and the terms of the query.
>
> It looks like you attached only the python program, only one attachment.
>
> Andi..
>
> >
> > Thanks a lot for your help
> > Marco
> >
> >
> >
> >> On Sun, Jan 29, 2017 at 3:10 AM, Andi Vajda <va...@apache.org> wrote:
> >>
> >> On Sat, 28 Jan 2017, marco turchi wrote:
> >>
> >>> Dear All,
> >>> I need to use the ShingleAnalyzerWrapper in PyLucene.
> >>>
> >>> I have built the analyzer similar to Lucene:
> >>> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, "
> " ,
> >>> True, False, None)
> >>>
> >>> and I have used it inside QuertParser
> >>> query = QueryParser("source", self.analyzer).parse("welcome world is
> at on")
> >>>
> >>> the output is:
> >>> source:welcome source:world source:is source:at source:on
> >>>
> >>> I have run the same code in Java and the output is how I would expect
> it:
> >>> source:welcome source:welcome world source:welcome world is
> source:welcome
> >>> world is at source:world source:world is source:world is at
> source:world is
> >>> at on source:is content:is at source:is at on source:at source:at on
> >>> source:on
> >>>
> >>> Do you have any ideas in what I'm doing wrong in PyLucene?
> >>
> >> Please, help me help you by including two simple programs that I can
> run to reproduce the problem. One in Java producing the output you expect,
> one in Python producing the output you're reporting.
> >>
> >> Thanks !
> >>
> >> Andi..
> >>
> >>
> >>>
> >>> Thanks a lot in advance for your help
> >>> Marco
> >>>
> >
> > <TestShingle.py>
>

Re: ShingleAnalyzerWrapper in PyLucene

Posted by Andi Vajda <va...@apache.org>.
> On Jan 29, 2017, at 03:50, marco turchi <ma...@gmail.com> wrote:
> 
> Dear Andi,
> please find in attachment the Java and the Python codes. Both of them, create an index with two records using Shingle analyser and then query it printing the query and the terms of the query.

It looks like you attached only the python program, only one attachment.

Andi..

> 
> Thanks a lot for your help 
> Marco
> 
> 
> 
>> On Sun, Jan 29, 2017 at 3:10 AM, Andi Vajda <va...@apache.org> wrote:
>> 
>> On Sat, 28 Jan 2017, marco turchi wrote:
>> 
>>> Dear All,
>>> I need to use the ShingleAnalyzerWrapper in PyLucene.
>>> 
>>> I have built the analyzer similar to Lucene:
>>> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, " " ,
>>> True, False, None)
>>> 
>>> and I have used it inside QuertParser
>>> query = QueryParser("source", self.analyzer).parse("welcome world is at on")
>>> 
>>> the output is:
>>> source:welcome source:world source:is source:at source:on
>>> 
>>> I have run the same code in Java and the output is how I would expect it:
>>> source:welcome source:welcome world source:welcome world is source:welcome
>>> world is at source:world source:world is source:world is at source:world is
>>> at on source:is content:is at source:is at on source:at source:at on
>>> source:on
>>> 
>>> Do you have any ideas in what I'm doing wrong in PyLucene?
>> 
>> Please, help me help you by including two simple programs that I can run to reproduce the problem. One in Java producing the output you expect, one in Python producing the output you're reporting.
>> 
>> Thanks !
>> 
>> Andi..
>> 
>> 
>>> 
>>> Thanks a lot in advance for your help
>>> Marco
>>> 
> 
> <TestShingle.py>

Re: ShingleAnalyzerWrapper in PyLucene

Posted by marco turchi <ma...@gmail.com>.
Dear Andi,
please find in attachment the Java and the Python codes. Both of them,
create an index with two records using Shingle analyser and then query it
printing the query and the terms of the query.

Thanks a lot for your help
Marco



On Sun, Jan 29, 2017 at 3:10 AM, Andi Vajda <va...@apache.org> wrote:

>
> On Sat, 28 Jan 2017, marco turchi wrote:
>
> Dear All,
>> I need to use the ShingleAnalyzerWrapper in PyLucene.
>>
>> I have built the analyzer similar to Lucene:
>> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, " " ,
>> True, False, None)
>>
>> and I have used it inside QuertParser
>> query = QueryParser("source", self.analyzer).parse("welcome world is at
>> on")
>>
>> the output is:
>> source:welcome source:world source:is source:at source:on
>>
>> I have run the same code in Java and the output is how I would expect it:
>> source:welcome source:welcome world source:welcome world is source:welcome
>> world is at source:world source:world is source:world is at source:world
>> is
>> at on source:is content:is at source:is at on source:at source:at on
>> source:on
>>
>> Do you have any ideas in what I'm doing wrong in PyLucene?
>>
>
> Please, help me help you by including two simple programs that I can run
> to reproduce the problem. One in Java producing the output you expect, one
> in Python producing the output you're reporting.
>
> Thanks !
>
> Andi..
>
>
>
>> Thanks a lot in advance for your help
>> Marco
>>
>>

Re: ShingleAnalyzerWrapper in PyLucene

Posted by Andi Vajda <va...@apache.org>.
On Sat, 28 Jan 2017, marco turchi wrote:

> Dear All,
> I need to use the ShingleAnalyzerWrapper in PyLucene.
>
> I have built the analyzer similar to Lucene:
> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, " " ,
> True, False, None)
>
> and I have used it inside QuertParser
> query = QueryParser("source", self.analyzer).parse("welcome world is at on")
>
> the output is:
> source:welcome source:world source:is source:at source:on
>
> I have run the same code in Java and the output is how I would expect it:
> source:welcome source:welcome world source:welcome world is source:welcome
> world is at source:world source:world is source:world is at source:world is
> at on source:is content:is at source:is at on source:at source:at on
> source:on
>
> Do you have any ideas in what I'm doing wrong in PyLucene?

Please, help me help you by including two simple programs that I can run to 
reproduce the problem. One in Java producing the output you expect, one in 
Python producing the output you're reporting.

Thanks !

Andi..

>
> Thanks a lot in advance for your help
> Marco
>