You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by lei liu <li...@gmail.com> on 2010/08/04 12:10:34 UTC

why is slow when use OR clause instead of IN clause

Because my company reuire we use 0.4.1 version, the version don't support IN
clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3) to
implement the IN clause(example: id in(1,2,3) ).  I know it will be slower
especially when the list after "in" is very long.  Could anybody can tell me
why is slow when use OR clause to implement In clause?


Thanks,


LiuLei

Re: why is slow when use OR clause instead of IN clause

Posted by lei liu <li...@gmail.com>.
When there are one thousand OR clause, the hive appear below exception:
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.lang.StackOverflowError
        at java.beans.Statement.<init>(Statement.java:60)
        at java.beans.Expression.<init>(Expression.java:47)
        at java.beans.Expression.<init>(Expression.java:65)
        at
java.beans.PrimitivePersistenceDelegate.instantiate(MetaData.java:79)
        at
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:97)
        at java.beans.Encoder.writeObject(Encoder.java:54)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:257)
        at java.beans.Encoder.writeObject1(Encoder.java:206)
        at java.beans.Encoder.cloneStatement(Encoder.java:219)
        at java.beans.Encoder.writeExpression(Encoder.java:278)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:372)
        at
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:97)
        at java.beans.Encoder.writeObject(Encoder.java:54)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:257)
        at java.beans.Encoder.writeObject1(Encoder.java:206)
        at java.beans.Encoder.cloneStatement(Encoder.java:219)
        at java.beans.Encoder.writeExpression(Encoder.java:278)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:372)
        at
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:97)
        at java.beans.Encoder.writeObject(Encoder.java:54)
        at java.beans.XMLEncoder.writeObject(XMLEncoder.java:257)
        at java.beans.Encoder.writeExpression(Encoder.java:279)
        at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:372)
        at
java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:212)
        at
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:247)
        at
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:395)
        at
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:100).



When there are two hundred OR clause, it is very very slow.

Now I use 0.4.1 version, if I upgrade to 0.6 version, which things I need to
do?

In addition, when is the 0.6 version is released?

Thanks,


LiuLei

2010/8/5 Ning Zhang <nz...@facebook.com>

> I tested (1000 disjunctions) and it was extremely slow but no OOM. The
> issue seems to be the fact that we serialize the plan by writing to HDFS
> file directly. We probably should cache it locally and then write it to
> HDFS.
>
> On Aug 4, 2010, at 10:23 AM, Edward Capriolo wrote:
>
> > On Wed, Aug 4, 2010 at 1:15 PM, Ning Zhang <nz...@facebook.com> wrote:
> >> Currently an expression tree (series of ORs in this case) is not
> collapsed to one operator or any other optimizations. It would be great to
> have this optimization rule to convert an OR operator tree to one IN
> operator. Would you be able to file a JIRA and contribute a patch?
> >>
> >> On Aug 4, 2010, at 7:46 AM, Mark Tozzi wrote:
> >>
> >>> I haven't looked at the code, but I assume the query parser would sort
> >>> the 'in' terms and then do a binary search lookup into them for each
> >>> row, while the 'or' terms don't have that kind of obvious relationship
> >>> and are probably tested in sequence.  This would give the in O(log N)
> >>> performance compared to a chain of or's having O(N) performance, per
> >>> row queried.  For large N, that could add up.  That being said, I'm
> >>> just speculating here.  The query parser may be smart enough to
> >>> optimize the related or's in the same way, or it may not optimize that
> >>> at all.  If I get a chance, I'll try to dig around and see what it's
> >>> doing, as I have also had a lot of large 'in' queries and could use
> >>> every drop of performance I can get.
> >>>
> >>> --Mark
> >>>
> >>> On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
> >>>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
> >>>>> Because my company reuire we use 0.4.1 version, the version don't
> support IN
> >>>>> clause. I want to  use the OR clause(example:where id=1 or id=2 or
> id=3) to
> >>>>> implement the IN clause(example: id in(1,2,3) ).  I know it will be
> slower
> >>>>> especially when the list after "in" is very long.  Could anybody can
> tell me
> >>>>> why is slow when use OR clause to implement In clause?
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>>
> >>>>> LiuLei
> >>>>>
> >>>>
> >>>> I can not imagine the performance difference between 'or' or 'in'
> >>>> would be that great but I never benchmarked it. The big looming
> >>>> problems is that if you string enough 'or' together (say 8000) the
> >>>> query parser which uses java beans serialization will OOM.
> >>>>
> >>>> Edward
> >>>>
> >>
> >>
> >
> > For reference I did this as a test case....
> > SELECT * FROM src where
> > key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> > OR key=0 OR key=0 OR key=0 OR
> > key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> > OR key=0 OR key=0 OR key=0 OR
> > ...(100 more of these)
> >
> > No OOM but I gave up after the test case did not go anywhere for about
> > 2 minutes.
> >
> > Edward
>
>

Re: why is slow when use OR clause instead of IN clause

Posted by Ning Zhang <nz...@facebook.com>.
I tested (1000 disjunctions) and it was extremely slow but no OOM. The issue seems to be the fact that we serialize the plan by writing to HDFS file directly. We probably should cache it locally and then write it to HDFS. 

On Aug 4, 2010, at 10:23 AM, Edward Capriolo wrote:

> On Wed, Aug 4, 2010 at 1:15 PM, Ning Zhang <nz...@facebook.com> wrote:
>> Currently an expression tree (series of ORs in this case) is not collapsed to one operator or any other optimizations. It would be great to have this optimization rule to convert an OR operator tree to one IN operator. Would you be able to file a JIRA and contribute a patch?
>> 
>> On Aug 4, 2010, at 7:46 AM, Mark Tozzi wrote:
>> 
>>> I haven't looked at the code, but I assume the query parser would sort
>>> the 'in' terms and then do a binary search lookup into them for each
>>> row, while the 'or' terms don't have that kind of obvious relationship
>>> and are probably tested in sequence.  This would give the in O(log N)
>>> performance compared to a chain of or's having O(N) performance, per
>>> row queried.  For large N, that could add up.  That being said, I'm
>>> just speculating here.  The query parser may be smart enough to
>>> optimize the related or's in the same way, or it may not optimize that
>>> at all.  If I get a chance, I'll try to dig around and see what it's
>>> doing, as I have also had a lot of large 'in' queries and could use
>>> every drop of performance I can get.
>>> 
>>> --Mark
>>> 
>>> On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <ed...@gmail.com> wrote:
>>>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
>>>>> Because my company reuire we use 0.4.1 version, the version don't support IN
>>>>> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3) to
>>>>> implement the IN clause(example: id in(1,2,3) ).  I know it will be slower
>>>>> especially when the list after "in" is very long.  Could anybody can tell me
>>>>> why is slow when use OR clause to implement In clause?
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> 
>>>>> LiuLei
>>>>> 
>>>> 
>>>> I can not imagine the performance difference between 'or' or 'in'
>>>> would be that great but I never benchmarked it. The big looming
>>>> problems is that if you string enough 'or' together (say 8000) the
>>>> query parser which uses java beans serialization will OOM.
>>>> 
>>>> Edward
>>>> 
>> 
>> 
> 
> For reference I did this as a test case....
> SELECT * FROM src where
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> ...(100 more of these)
> 
> No OOM but I gave up after the test case did not go anywhere for about
> 2 minutes.
> 
> Edward


Re: why is slow when use OR clause instead of IN clause

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Aug 4, 2010 at 1:15 PM, Ning Zhang <nz...@facebook.com> wrote:
> Currently an expression tree (series of ORs in this case) is not collapsed to one operator or any other optimizations. It would be great to have this optimization rule to convert an OR operator tree to one IN operator. Would you be able to file a JIRA and contribute a patch?
>
> On Aug 4, 2010, at 7:46 AM, Mark Tozzi wrote:
>
>> I haven't looked at the code, but I assume the query parser would sort
>> the 'in' terms and then do a binary search lookup into them for each
>> row, while the 'or' terms don't have that kind of obvious relationship
>> and are probably tested in sequence.  This would give the in O(log N)
>> performance compared to a chain of or's having O(N) performance, per
>> row queried.  For large N, that could add up.  That being said, I'm
>> just speculating here.  The query parser may be smart enough to
>> optimize the related or's in the same way, or it may not optimize that
>> at all.  If I get a chance, I'll try to dig around and see what it's
>> doing, as I have also had a lot of large 'in' queries and could use
>> every drop of performance I can get.
>>
>> --Mark
>>
>> On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <ed...@gmail.com> wrote:
>>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
>>>> Because my company reuire we use 0.4.1 version, the version don't support IN
>>>> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3) to
>>>> implement the IN clause(example: id in(1,2,3) ).  I know it will be slower
>>>> especially when the list after "in" is very long.  Could anybody can tell me
>>>> why is slow when use OR clause to implement In clause?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> LiuLei
>>>>
>>>
>>> I can not imagine the performance difference between 'or' or 'in'
>>> would be that great but I never benchmarked it. The big looming
>>> problems is that if you string enough 'or' together (say 8000) the
>>> query parser which uses java beans serialization will OOM.
>>>
>>> Edward
>>>
>
>

For reference I did this as a test case....
SELECT * FROM src where
key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
OR key=0 OR key=0 OR key=0 OR
key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
OR key=0 OR key=0 OR key=0 OR
...(100 more of these)

No OOM but I gave up after the test case did not go anywhere for about
2 minutes.

Edward

Re: why is slow when use OR clause instead of IN clause

Posted by Ning Zhang <nz...@facebook.com>.
Currently an expression tree (series of ORs in this case) is not collapsed to one operator or any other optimizations. It would be great to have this optimization rule to convert an OR operator tree to one IN operator. Would you be able to file a JIRA and contribute a patch?

On Aug 4, 2010, at 7:46 AM, Mark Tozzi wrote:

> I haven't looked at the code, but I assume the query parser would sort
> the 'in' terms and then do a binary search lookup into them for each
> row, while the 'or' terms don't have that kind of obvious relationship
> and are probably tested in sequence.  This would give the in O(log N)
> performance compared to a chain of or's having O(N) performance, per
> row queried.  For large N, that could add up.  That being said, I'm
> just speculating here.  The query parser may be smart enough to
> optimize the related or's in the same way, or it may not optimize that
> at all.  If I get a chance, I'll try to dig around and see what it's
> doing, as I have also had a lot of large 'in' queries and could use
> every drop of performance I can get.
> 
> --Mark
> 
> On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <ed...@gmail.com> wrote:
>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
>>> Because my company reuire we use 0.4.1 version, the version don't support IN
>>> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3) to
>>> implement the IN clause(example: id in(1,2,3) ).  I know it will be slower
>>> especially when the list after "in" is very long.  Could anybody can tell me
>>> why is slow when use OR clause to implement In clause?
>>> 
>>> 
>>> Thanks,
>>> 
>>> 
>>> LiuLei
>>> 
>> 
>> I can not imagine the performance difference between 'or' or 'in'
>> would be that great but I never benchmarked it. The big looming
>> problems is that if you string enough 'or' together (say 8000) the
>> query parser which uses java beans serialization will OOM.
>> 
>> Edward
>> 


Re: why is slow when use OR clause instead of IN clause

Posted by Mark Tozzi <ma...@gmail.com>.
I haven't looked at the code, but I assume the query parser would sort
the 'in' terms and then do a binary search lookup into them for each
row, while the 'or' terms don't have that kind of obvious relationship
and are probably tested in sequence.  This would give the in O(log N)
performance compared to a chain of or's having O(N) performance, per
row queried.  For large N, that could add up.  That being said, I'm
just speculating here.  The query parser may be smart enough to
optimize the related or's in the same way, or it may not optimize that
at all.  If I get a chance, I'll try to dig around and see what it's
doing, as I have also had a lot of large 'in' queries and could use
every drop of performance I can get.

--Mark

On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <ed...@gmail.com> wrote:
> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
>> Because my company reuire we use 0.4.1 version, the version don't support IN
>> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3) to
>> implement the IN clause(example: id in(1,2,3) ).  I know it will be slower
>> especially when the list after "in" is very long.  Could anybody can tell me
>> why is slow when use OR clause to implement In clause?
>>
>>
>> Thanks,
>>
>>
>> LiuLei
>>
>
> I can not imagine the performance difference between 'or' or 'in'
> would be that great but I never benchmarked it. The big looming
> problems is that if you string enough 'or' together (say 8000) the
> query parser which uses java beans serialization will OOM.
>
> Edward
>

Re: why is slow when use OR clause instead of IN clause

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Aug 4, 2010 at 12:50 PM, Ning Zhang <nz...@facebook.com> wrote:
> Edward, did you have HIVE-543 patch merged in your Hive? That patch revolves an issue of OOM in the hive client side.
>
> On Aug 4, 2010, at 9:22 AM, Edward Capriolo wrote:
>
>> On Wed, Aug 4, 2010 at 12:15 PM, lei liu <li...@gmail.com> wrote:
>>> Hello Edward Capriolo,
>>>
>>> Thank you for your reply. Are you sure that if you string enough 'or'
>>> together (say 8000) the query parser which uses java beans serialization
>>> will OOM? How many memory you assign to hive?
>>>
>>> 2010/8/4 Edward Capriolo <ed...@gmail.com>
>>>>
>>>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
>>>>> Because my company reuire we use 0.4.1 version, the version don't
>>>>> support IN
>>>>> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3)
>>>>> to
>>>>> implement the IN clause(example: id in(1,2,3) ).  I know it will be
>>>>> slower
>>>>> especially when the list after "in" is very long.  Could anybody can
>>>>> tell me
>>>>> why is slow when use OR clause to implement In clause?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> LiuLei
>>>>>
>>>>
>>>> I can not imagine the performance difference between 'or' or 'in'
>>>> would be that great but I never benchmarked it. The big looming
>>>> problems is that if you string enough 'or' together (say 8000) the
>>>> query parser which uses java beans serialization will OOM.
>>>>
>>>> Edward
>>>
>>>
>>
>> That is exactly what I am saying. I tested with 4GB and 8GB. I am not
>> exactly sure how many OR's you can get away with for your memory size,
>> but some upper limit exists currently. Most people never hit it. (I
>> did because my middle name is "edge case" )
>
>

No, I do not have HIVE-543 merged in yet. I am not having that problem
at the moment but in the past I did. Might not be the case anymore.

Re: why is slow when use OR clause instead of IN clause

Posted by Ning Zhang <nz...@facebook.com>.
Edward, did you have HIVE-543 patch merged in your Hive? That patch revolves an issue of OOM in the hive client side. 

On Aug 4, 2010, at 9:22 AM, Edward Capriolo wrote:

> On Wed, Aug 4, 2010 at 12:15 PM, lei liu <li...@gmail.com> wrote:
>> Hello Edward Capriolo,
>> 
>> Thank you for your reply. Are you sure that if you string enough 'or'
>> together (say 8000) the query parser which uses java beans serialization
>> will OOM? How many memory you assign to hive?
>> 
>> 2010/8/4 Edward Capriolo <ed...@gmail.com>
>>> 
>>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
>>>> Because my company reuire we use 0.4.1 version, the version don't
>>>> support IN
>>>> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3)
>>>> to
>>>> implement the IN clause(example: id in(1,2,3) ).  I know it will be
>>>> slower
>>>> especially when the list after "in" is very long.  Could anybody can
>>>> tell me
>>>> why is slow when use OR clause to implement In clause?
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> 
>>>> LiuLei
>>>> 
>>> 
>>> I can not imagine the performance difference between 'or' or 'in'
>>> would be that great but I never benchmarked it. The big looming
>>> problems is that if you string enough 'or' together (say 8000) the
>>> query parser which uses java beans serialization will OOM.
>>> 
>>> Edward
>> 
>> 
> 
> That is exactly what I am saying. I tested with 4GB and 8GB. I am not
> exactly sure how many OR's you can get away with for your memory size,
> but some upper limit exists currently. Most people never hit it. (I
> did because my middle name is "edge case" )


Re: why is slow when use OR clause instead of IN clause

Posted by lei liu <li...@gmail.com>.
Now I assign 100M memory to hive, you consider that can support how many
'OR' string?

2010/8/5 Edward Capriolo <ed...@gmail.com>

> On Wed, Aug 4, 2010 at 12:15 PM, lei liu <li...@gmail.com> wrote:
> > Hello Edward Capriolo,
> >
> > Thank you for your reply. Are you sure that if you string enough 'or'
> > together (say 8000) the query parser which uses java beans serialization
> > will OOM? How many memory you assign to hive?
> >
> > 2010/8/4 Edward Capriolo <ed...@gmail.com>
> >>
> >> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
> >> > Because my company reuire we use 0.4.1 version, the version don't
> >> > support IN
> >> > clause. I want to  use the OR clause(example:where id=1 or id=2 or
> id=3)
> >> > to
> >> > implement the IN clause(example: id in(1,2,3) ).  I know it will be
> >> > slower
> >> > especially when the list after "in" is very long.  Could anybody can
> >> > tell me
> >> > why is slow when use OR clause to implement In clause?
> >> >
> >> >
> >> > Thanks,
> >> >
> >> >
> >> > LiuLei
> >> >
> >>
> >> I can not imagine the performance difference between 'or' or 'in'
> >> would be that great but I never benchmarked it. The big looming
> >> problems is that if you string enough 'or' together (say 8000) the
> >> query parser which uses java beans serialization will OOM.
> >>
> >> Edward
> >
> >
>
> That is exactly what I am saying. I tested with 4GB and 8GB. I am not
> exactly sure how many OR's you can get away with for your memory size,
> but some upper limit exists currently. Most people never hit it. (I
> did because my middle name is "edge case" )
>

Re: why is slow when use OR clause instead of IN clause

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Aug 4, 2010 at 12:15 PM, lei liu <li...@gmail.com> wrote:
> Hello Edward Capriolo,
>
> Thank you for your reply. Are you sure that if you string enough 'or'
> together (say 8000) the query parser which uses java beans serialization
> will OOM? How many memory you assign to hive?
>
> 2010/8/4 Edward Capriolo <ed...@gmail.com>
>>
>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
>> > Because my company reuire we use 0.4.1 version, the version don't
>> > support IN
>> > clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3)
>> > to
>> > implement the IN clause(example: id in(1,2,3) ).  I know it will be
>> > slower
>> > especially when the list after "in" is very long.  Could anybody can
>> > tell me
>> > why is slow when use OR clause to implement In clause?
>> >
>> >
>> > Thanks,
>> >
>> >
>> > LiuLei
>> >
>>
>> I can not imagine the performance difference between 'or' or 'in'
>> would be that great but I never benchmarked it. The big looming
>> problems is that if you string enough 'or' together (say 8000) the
>> query parser which uses java beans serialization will OOM.
>>
>> Edward
>
>

That is exactly what I am saying. I tested with 4GB and 8GB. I am not
exactly sure how many OR's you can get away with for your memory size,
but some upper limit exists currently. Most people never hit it. (I
did because my middle name is "edge case" )

Re: why is slow when use OR clause instead of IN clause

Posted by lei liu <li...@gmail.com>.
Hello Edward Capriolo,

Thank you for your reply. Are you sure that if you string enough 'or'
together (say 8000) the query parser which uses java beans serialization
will OOM? How many memory you assign to hive?

2010/8/4 Edward Capriolo <ed...@gmail.com>

> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
> > Because my company reuire we use 0.4.1 version, the version don't support
> IN
> > clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3)
> to
> > implement the IN clause(example: id in(1,2,3) ).  I know it will be
> slower
> > especially when the list after "in" is very long.  Could anybody can tell
> me
> > why is slow when use OR clause to implement In clause?
> >
> >
> > Thanks,
> >
> >
> > LiuLei
> >
>
> I can not imagine the performance difference between 'or' or 'in'
> would be that great but I never benchmarked it. The big looming
> problems is that if you string enough 'or' together (say 8000) the
> query parser which uses java beans serialization will OOM.
>
> Edward
>

Re: why is slow when use OR clause instead of IN clause

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Aug 4, 2010 at 6:10 AM, lei liu <li...@gmail.com> wrote:
> Because my company reuire we use 0.4.1 version, the version don't support IN
> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3) to
> implement the IN clause(example: id in(1,2,3) ).  I know it will be slower
> especially when the list after "in" is very long.  Could anybody can tell me
> why is slow when use OR clause to implement In clause?
>
>
> Thanks,
>
>
> LiuLei
>

I can not imagine the performance difference between 'or' or 'in'
would be that great but I never benchmarked it. The big looming
problems is that if you string enough 'or' together (say 8000) the
query parser which uses java beans serialization will OOM.

Edward