You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geode.apache.org by Roi Apelker <Ro...@amdocs.com> on 2017/08/27 09:54:44 UTC

Indxes and hints

Hi,

I have a few questions regarding indexes and hints, if someone could confirm the below it would be great:

- I have a situation where I use 3 field values in a select (something like select where A>1, B>1, C=true)
- A and B are fields on the key, and C is a field on the value.
- A and B are indexes
- I am looking for the most efficient way to execute the query above, in the situation where there is overflow to eviction files, meaning some of the data has already been evicted to a file, which slows down the select considerably (this is not persistence, but overflow).


1. Is it true to say, that the query as it is will load all the data values from the file, since the field C is part of the value, which is already persisted to file?
2. If I add a hint on A and B, will it mean that there will be a "2 phase search", first the select on A and B, and then, only on the results, on the field C? (this way, not all records will be loaded from file, only those that suit the A and B condition)
3. Is it possible to define an index on a value field? (i.e. not from the key) - will it work exactly like defining one form the key or are three any limitations? (again, I am looking to overcome the situation, where as it seems, the records are loaded unnecessarily from disk)

Thank you,

Roi


-----Original Message-----
From: Roi Apelker 
Sent: Thursday, August 24, 2017 7:03 PM
To: dev@geode.apache.org
Subject: eviction files

Hi,

I am looking into the internals of the eviction process,

Can anyone point me to the most important classes, the main mechanism "wheels" etc.?

Thanks,

Roi

-----Original Message-----
From: Roi Apelker
Sent: Wednesday, August 16, 2017 8:38 PM
To: dev@geode.apache.org
Subject: RE: continuous query internal mechanism questions

It seems like the code in the native client (in the version I have, which may be old) send the message to all servers:

CqResultsPtr CqQueryImpl::executeWithInitialResults(uint32_t timeout) {
  .......

  TcrMessage msg(TcrMessage::EXECUTECQ_WITH_IR_MSG_TYPE, m_cqName, m_queryString, CqState::RUNNING, isDurable(), m_tccdm);
  TcrMessage reply(true, m_tccdm);
  ChunkedQueryResponse* resultCollector = (new ChunkedQueryResponse(reply));
  reply.setChunkedResultHandler(static_cast<TcrChunkedResult *>(resultCollector));
  reply.setTimeout(timeout);

  GfErrType err = GF_NOERR;
  err = m_tccdm->sendSyncRequest(msg, reply); ..........

And sendSyncRequest:
...

for (std::vector<TcrEndpoint*>::iterator ep = m_endpoints.begin(); ep != m_endpoints.end(); ++ep) {
    if ((*ep)->connected()) {
      (*ep)->setDM(this);
      opErr = sendRequestToEP(request, reply, *ep);//this will go to ThinClientDistributionManager

...


Can this be causing the issue?



-Roi





-----Original Message-----
From: Jason Huynh [mailto:jasonhuynh@apache.org]
Sent: Tuesday, August 15, 2017 9:25 PM
To: dev@geode.apache.org
Subject: Re: continuous query internal mechanism questions

I am not quite sure how native client registers cqs. From my understanding:
with the java api, I believe there is only one message (ExecuteCQ message) that is executed on the server side and then replicated to the other nodes through the profile (OperationMessage).

It seems the extra ExecuteCQ message failing and then closing the cq might be putting the system in a weird state...

On Tue, Aug 15, 2017 at 7:56 AM Roi Apelker <Ro...@amdocs.com> wrote:

> Hi,
>
> I have been examining the continuous query registration mechanism for 
> quite some time This is related to an issue that I have, where 
> sometimes a node crashes (1 node out of 2), and the other one does not 
> send CQ events. The CQ is registered on a partitioned region which 
> resides on these 2 nodes.
>
> I noticed the following behavior, and I wonder if anyone can comment 
> regarding it, if it is justified or not and what is the reason:
>
> 1. When the software using the client (native client) registers for 
> the CQ, a CQ command (ExecuteCQ61) is received on both servers.
>  -- is this normal behaviour? Does the client actually send this 
> command to both servers?
>
> 2. When this command is received by a server, and the CQ is 
> registered, another registration message is sent to the other node via 
> an OperationMessage (REGISTER_CQ)
>  -- it seems that regularly, the server can handle this situation as 
> the second registration identifies the previous one and does not 
> affect it. but the question, why do we need this 2nd registration, if 
> there is a command sent to each server?
>
> 3. For some reason, sometimes there is a failure to complete the first 
> registration (executed by ExecuteCQ61) and then this failure causes a 
> closure to the CQ, which is accompanied with a close request to the 
> other node.
>  -- I assume by now, since 2 registrations and one closure have 
> occurred on node 2, the CQ is still active and the client receives notifications.
>
> 4. Sometimes, 1 out of 5, once node 1 crashes, I get a cleanup 
> operation, caused by the crash (via MemberCrashedEvent), and this also 
> closes the existing CQ, and in this case the CQ in node 2 does not 
> operate anymore and the client receives no notifications.
>  -- fact is, that 4 out of 4 times, I do not get this cleanup by 
> MemberCrashedEvent (maybe due to some other error), and that the CQ 
> notifications are received normally.
>
> Can anyone clear things up for me? Any comment on any of the 
> statements above will be greatly appreciated.
>
> Thanks,
>
> Roi
>
>
> -----Original Message-----
> From: Roi Apelker
> Sent: Wednesday, August 09, 2017 3:21 PM
> To: dev@geode.apache.org
> Subject: RE: continuous query internal mechanism
>
> Dhanyavad
>
> -----Original Message-----
> From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> Sent: Tuesday, August 08, 2017 9:55 PM
> To: dev@geode.apache.org
> Subject: Re: continuous query internal mechanism
>
> Registered events, i meant, are events generated for interest 
> registration "region.registerInterest(*)". And CqEvents are for CQs registered.
>
> -Anil.
>
>
> On Tue, Aug 8, 2017 at 12:27 AM, Roi Apelker <Ro...@amdocs.com>
> wrote:
>
> > Shukriya
> >
> > What is the difference between registered events and CQ events?
> >
> > -----Original Message-----
> > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > Sent: Monday, August 07, 2017 10:12 PM
> > To: dev@geode.apache.org
> > Subject: Re: continuous query internal mechanism
> >
> > CQ Processing on server side is same for all clients (Java, C++)...
> >
> > The subscription events are sent to client as ClientUpdateMessage, 
> > which holds information about registered events and CQ events. The 
> > client process this and updates/invokes the client side 
> > cache/listeners with respective event. Look into 
> > ClientUpdateMessageImpl and CacheClientUpdater (for client side
> processing).
> >
> > -Anil.
> >
> >
> >
> >
> > On Mon, Aug 7, 2017 at 11:01 AM, Roi Apelker 
> > <Ro...@amdocs.com>
> > wrote:
> >
> > > Thanks,
> > >
> > > By the way, is there any difference in the behaviour of the 
> > > server, if the client that registered the CQ is a native (C++) client?
> > >
> > > I have been going over the classes and code for some time and 
> > > can't seem to find the actual location where a CQ 
> > > update/notification is
> > sent...
> > >
> > > It's like CqEventImpl class is never even generated in this scenario.
> > >
> > > If anyone can help here I would be most grateful :-)
> > >
> > > Thanks
> > >
> > > Roi
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > > Sent: Monday, August 07, 2017 8:23 PM
> > > To: dev@geode.apache.org
> > > Subject: Re: continuous query internal mechanism
> > >
> > > You can find those in CqServiceImpl.process*()...
> > >
> > > -Anil.
> > >
> > >
> > > On Mon, Aug 7, 2017 at 9:14 AM, Roi Apelker 
> > > <Ro...@amdocs.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am trying to look into the code of the continuous query 
> > > > mechanism
> > > > - where the GEODE server sends the notification back to the client.
> > > >
> > > > Can anyone point me to the central classes of continuous query, 
> > > > especially to the one that is responsible for the calculation of 
> > > > the new data and packing it as a message back to the client?
> > > >
> > > > Thanks,
> > > >
> > > > Roi
> > > >
> > > > This message and the information contained herein is proprietary 
> > > > and confidential and subject to the Amdocs policy statement,
> > > >
> > > > you may review at https://www.amdocs.com/about/email-disclaimer
> > > > < https://www.amdocs.com/about/email-disclaimer>
> > > >
> > > This message and the information contained herein is proprietary 
> > > and confidential and subject to the Amdocs policy statement,
> > >
> > > you may review at https://www.amdocs.com/about/email-disclaimer < 
> > > https://www.amdocs.com/about/email-disclaimer>
> > >
> > This message and the information contained herein is proprietary and 
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer < 
> > https://www.amdocs.com/about/email-disclaimer>
> >
> This message and the information contained herein is proprietary and 
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer < 
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and 
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer < 
> https://www.amdocs.com/about/email-disclaimer>
>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>

Re: Indxes and hints

Posted by Jason Huynh <jh...@pivotal.io>.

Not exactly sure the reasoning to use only one index only, other than it
was a performance choice at the time.

Are you sure you are seeing both indexes being used?  In Geode, with the
following query,select * from /region p where p.ID > 0 AND p.status = 'on'
ORDER BY ID,  I only see one of the indexes being used.  I haven't seen a
second stage query in GEODE but maybe my query is not correct.

On Sun, Sep 3, 2017 at 9:17 AM Roi Apelker <Ro...@amdocs.com> wrote:

> Thank you,
>
> Can you explain why " The query engine was modified to try to only use one
> index if it can."?
>
> I also noticed that even if I query on A and B, and ORDER BY B - it seemed
> to perform the query on B only, at least in a separate stage. Why is that?
>
>
>
>
> -Roi
>
> -----Original Message-----
> From: Jason Huynh [mailto:jhuynh@pivotal.io]
> Sent: Wednesday, August 30, 2017 10:15 PM
> To: dev@geode.apache.org
> Subject: Re: Indxes and hints
>
> You will probably have to step through debugger for this one.. it really
> depends on the query.  For this query, I expect the query engine to pick
> one index and run the rest of the criteria on the results of the first
> index used.  My guess is you have created a CompactRangeIndex, and if so,
> you can see in CompactRangeIndex.java around line 811:
>
> if (ok && runtimeItr != null && iterOps != null) {
>
>   ok = QueryUtils.applyCondition(iterOps, context);
>
> }
> This is where it would apply the older conditions (B and C or A and C
> depending on which index was selected) The query engine was modified to try
> to only use one index if it can.
>
> The load from disk (again assuming CompactRangeIndex) is probably
> occurring in MemoryIndexStore.getTargetObject.
>
> On Wed, Aug 30, 2017 at 9:21 AM Roi Apelker <Ro...@amdocs.com>
> wrote:
>
> > One more question:
> >
> > As I am trying to create a situation where the disk is accessed as
> > least as possible (with a select distinct from X where a=1 and b>10
> > and c=true; In which a and b are indexes and c is not, and c is in the
> > value which is evicted to disk)
> >
> > Did I get it right - that if I use a hint on a, or a hint on b, or a
> > hint on both, it will first do a select on the hinted, and ONLY THEN the
> others?
> >
> > Can anyone refer me to the code (where the 2 phase search occurs)?
> >
> > Where is the value finally loaded from disk?
> >
> > Thank you
> >
> > Roi
> > -----Original Message-----
> > From: Roi Apelker
> > Sent: Tuesday, August 29, 2017 4:02 PM
> > To: dev@geode.apache.org
> > Subject: RE: Indxes and hints
> >
> > Thank you Jason :-)
> >
> > -----Original Message-----
> > From: Jason Huynh [mailto:jhuynh@pivotal.io]
> > Sent: Monday, August 28, 2017 7:24 PM
> > To: dev@geode.apache.org
> > Subject: Re: Indxes and hints
> >
> > Hi Roi,
> >
> > Answers are below the questions...
> >
> > Question 1. Is it true to say, that the query as it is will load all
> > the data values from the file, since the field C is part of the value,
> > which is already persisted to file?
> >
> > Depending on if an index is used or not, if an index is used, the
> > values that are part of the results will need to be loaded to actually
> > return a result.  If an index is not used, then the all the values
> > would need to be loaded to actually have something to evaluate the
> filter criteria on.
> >
> > Question 2. If I add a hint on A and B, will it mean that there will
> > be a
> > "2 phase search", first the select on A and B, and then, only on the
> > results, on the field C? (this way, not all records will be loaded
> > from file, only those that suit the A and B condition)
> >
> > Depending on the  query, it could use one, or more.  If it's a query
> > with only AND clauses, it should just choose one and then evaluate the
> > other filters on the subset that is returned from the index.
> >
> > Question 3. Is it possible to define an index on a value field? (i.e.
> > not from the key) - will it work exactly like defining one form the
> > key or are three any limitations? (again, I am looking to overcome the
> > situation, where as it seems, the records are loaded unnecessarily
> > from disk)
> >
> > Yes, indexes can be defined on fields in the value.  It will work the
> same.
> >
> >
> > If you are sure you are already using an index in the query and still
> > loading every value for every execution of that query, there may be
> > something weird going on...
> >
> > On Sun, Aug 27, 2017 at 2:55 AM Roi Apelker <Ro...@amdocs.com>
> > wrote:
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> >
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>

RE: Indxes and hints

Posted by Roi Apelker <Ro...@amdocs.com>.

Thank you,

Can you explain why " The query engine was modified to try to only use one index if it can."?

I also noticed that even if I query on A and B, and ORDER BY B - it seemed to perform the query on B only, at least in a separate stage. Why is that?




-Roi

-----Original Message-----
From: Jason Huynh [mailto:jhuynh@pivotal.io] 
Sent: Wednesday, August 30, 2017 10:15 PM
To: dev@geode.apache.org
Subject: Re: Indxes and hints

You will probably have to step through debugger for this one.. it really depends on the query.  For this query, I expect the query engine to pick one index and run the rest of the criteria on the results of the first index used.  My guess is you have created a CompactRangeIndex, and if so, you can see in CompactRangeIndex.java around line 811:

if (ok && runtimeItr != null && iterOps != null) {

  ok = QueryUtils.applyCondition(iterOps, context);

}
This is where it would apply the older conditions (B and C or A and C depending on which index was selected) The query engine was modified to try to only use one index if it can.

The load from disk (again assuming CompactRangeIndex) is probably occurring in MemoryIndexStore.getTargetObject.

On Wed, Aug 30, 2017 at 9:21 AM Roi Apelker <Ro...@amdocs.com> wrote:

> One more question:
>
> As I am trying to create a situation where the disk is accessed as 
> least as possible (with a select distinct from X where a=1 and b>10 
> and c=true; In which a and b are indexes and c is not, and c is in the 
> value which is evicted to disk)
>
> Did I get it right - that if I use a hint on a, or a hint on b, or a 
> hint on both, it will first do a select on the hinted, and ONLY THEN the others?
>
> Can anyone refer me to the code (where the 2 phase search occurs)?
>
> Where is the value finally loaded from disk?
>
> Thank you
>
> Roi
> -----Original Message-----
> From: Roi Apelker
> Sent: Tuesday, August 29, 2017 4:02 PM
> To: dev@geode.apache.org
> Subject: RE: Indxes and hints
>
> Thank you Jason :-)
>
> -----Original Message-----
> From: Jason Huynh [mailto:jhuynh@pivotal.io]
> Sent: Monday, August 28, 2017 7:24 PM
> To: dev@geode.apache.org
> Subject: Re: Indxes and hints
>
> Hi Roi,
>
> Answers are below the questions...
>
> Question 1. Is it true to say, that the query as it is will load all 
> the data values from the file, since the field C is part of the value, 
> which is already persisted to file?
>
> Depending on if an index is used or not, if an index is used, the 
> values that are part of the results will need to be loaded to actually 
> return a result.  If an index is not used, then the all the values 
> would need to be loaded to actually have something to evaluate the filter criteria on.
>
> Question 2. If I add a hint on A and B, will it mean that there will 
> be a
> "2 phase search", first the select on A and B, and then, only on the 
> results, on the field C? (this way, not all records will be loaded 
> from file, only those that suit the A and B condition)
>
> Depending on the  query, it could use one, or more.  If it's a query 
> with only AND clauses, it should just choose one and then evaluate the 
> other filters on the subset that is returned from the index.
>
> Question 3. Is it possible to define an index on a value field? (i.e. 
> not from the key) - will it work exactly like defining one form the 
> key or are three any limitations? (again, I am looking to overcome the 
> situation, where as it seems, the records are loaded unnecessarily 
> from disk)
>
> Yes, indexes can be defined on fields in the value.  It will work the same.
>
>
> If you are sure you are already using an index in the query and still 
> loading every value for every execution of that query, there may be 
> something weird going on...
>
> On Sun, Aug 27, 2017 at 2:55 AM Roi Apelker <Ro...@amdocs.com>
> wrote:
> This message and the information contained herein is proprietary and 
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer < 
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and 
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer < 
> https://www.amdocs.com/about/email-disclaimer>
>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>

Re: Indxes and hints

Posted by Jason Huynh <jh...@pivotal.io>.

You will probably have to step through debugger for this one.. it really
depends on the query.  For this query, I expect the query engine to pick
one index and run the rest of the criteria on the results of the first
index used.  My guess is you have created a CompactRangeIndex, and if so,
you can see in CompactRangeIndex.java around line 811:

if (ok && runtimeItr != null && iterOps != null) {

  ok = QueryUtils.applyCondition(iterOps, context);

}
This is where it would apply the older conditions (B and C or A and C
depending on which index was selected)
The query engine was modified to try to only use one index if it can.

The load from disk (again assuming CompactRangeIndex) is probably occurring
in MemoryIndexStore.getTargetObject.

On Wed, Aug 30, 2017 at 9:21 AM Roi Apelker <Ro...@amdocs.com> wrote:

> One more question:
>
> As I am trying to create a situation where the disk is accessed as least
> as possible
> (with a select distinct from X where a=1 and b>10 and c=true;
> In which a and b are indexes and c is not, and c is in the value which is
> evicted to disk)
>
> Did I get it right - that if I use a hint on a, or a hint on b, or a hint
> on both, it will first do a select on the hinted, and ONLY THEN the others?
>
> Can anyone refer me to the code (where the 2 phase search occurs)?
>
> Where is the value finally loaded from disk?
>
> Thank you
>
> Roi
> -----Original Message-----
> From: Roi Apelker
> Sent: Tuesday, August 29, 2017 4:02 PM
> To: dev@geode.apache.org
> Subject: RE: Indxes and hints
>
> Thank you Jason :-)
>
> -----Original Message-----
> From: Jason Huynh [mailto:jhuynh@pivotal.io]
> Sent: Monday, August 28, 2017 7:24 PM
> To: dev@geode.apache.org
> Subject: Re: Indxes and hints
>
> Hi Roi,
>
> Answers are below the questions...
>
> Question 1. Is it true to say, that the query as it is will load all the
> data values from the file, since the field C is part of the value, which is
> already persisted to file?
>
> Depending on if an index is used or not, if an index is used, the values
> that are part of the results will need to be loaded to actually return a
> result.  If an index is not used, then the all the values would need to be
> loaded to actually have something to evaluate the filter criteria on.
>
> Question 2. If I add a hint on A and B, will it mean that there will be a
> "2 phase search", first the select on A and B, and then, only on the
> results, on the field C? (this way, not all records will be loaded from
> file, only those that suit the A and B condition)
>
> Depending on the  query, it could use one, or more.  If it's a query with
> only AND clauses, it should just choose one and then evaluate the other
> filters on the subset that is returned from the index.
>
> Question 3. Is it possible to define an index on a value field? (i.e. not
> from the key) - will it work exactly like defining one form the key or are
> three any limitations? (again, I am looking to overcome the situation,
> where as it seems, the records are loaded unnecessarily from disk)
>
> Yes, indexes can be defined on fields in the value.  It will work the same.
>
>
> If you are sure you are already using an index in the query and still
> loading every value for every execution of that query, there may be
> something weird going on...
>
> On Sun, Aug 27, 2017 at 2:55 AM Roi Apelker <Ro...@amdocs.com>
> wrote:
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>

RE: Indxes and hints

Posted by Roi Apelker <Ro...@amdocs.com>.

One more question:

As I am trying to create a situation where the disk is accessed as least as possible 
(with a select distinct from X where a=1 and b>10 and c=true; 
In which a and b are indexes and c is not, and c is in the value which is evicted to disk)

Did I get it right - that if I use a hint on a, or a hint on b, or a hint on both, it will first do a select on the hinted, and ONLY THEN the others?

Can anyone refer me to the code (where the 2 phase search occurs)?

Where is the value finally loaded from disk?

Thank you

Roi
-----Original Message-----
From: Roi Apelker 
Sent: Tuesday, August 29, 2017 4:02 PM
To: dev@geode.apache.org
Subject: RE: Indxes and hints

Thank you Jason :-)

-----Original Message-----
From: Jason Huynh [mailto:jhuynh@pivotal.io] 
Sent: Monday, August 28, 2017 7:24 PM
To: dev@geode.apache.org
Subject: Re: Indxes and hints

Hi Roi,

Answers are below the questions...

Question 1. Is it true to say, that the query as it is will load all the data values from the file, since the field C is part of the value, which is already persisted to file?

Depending on if an index is used or not, if an index is used, the values that are part of the results will need to be loaded to actually return a result.  If an index is not used, then the all the values would need to be loaded to actually have something to evaluate the filter criteria on.

Question 2. If I add a hint on A and B, will it mean that there will be a
"2 phase search", first the select on A and B, and then, only on the results, on the field C? (this way, not all records will be loaded from file, only those that suit the A and B condition)

Depending on the  query, it could use one, or more.  If it's a query with only AND clauses, it should just choose one and then evaluate the other filters on the subset that is returned from the index.

Question 3. Is it possible to define an index on a value field? (i.e. not from the key) - will it work exactly like defining one form the key or are three any limitations? (again, I am looking to overcome the situation, where as it seems, the records are loaded unnecessarily from disk)

Yes, indexes can be defined on fields in the value.  It will work the same.


If you are sure you are already using an index in the query and still loading every value for every execution of that query, there may be something weird going on...

On Sun, Aug 27, 2017 at 2:55 AM Roi Apelker <Ro...@amdocs.com> wrote:
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>

RE: Indxes and hints

Posted by Roi Apelker <Ro...@amdocs.com>.

Thank you Jason :-)

-----Original Message-----
From: Jason Huynh [mailto:jhuynh@pivotal.io] 
Sent: Monday, August 28, 2017 7:24 PM
To: dev@geode.apache.org
Subject: Re: Indxes and hints

Hi Roi,

Answers are below the questions...

Question 1. Is it true to say, that the query as it is will load all the data values from the file, since the field C is part of the value, which is already persisted to file?

Depending on if an index is used or not, if an index is used, the values that are part of the results will need to be loaded to actually return a result.  If an index is not used, then the all the values would need to be loaded to actually have something to evaluate the filter criteria on.

Question 2. If I add a hint on A and B, will it mean that there will be a
"2 phase search", first the select on A and B, and then, only on the results, on the field C? (this way, not all records will be loaded from file, only those that suit the A and B condition)

Depending on the  query, it could use one, or more.  If it's a query with only AND clauses, it should just choose one and then evaluate the other filters on the subset that is returned from the index.

Question 3. Is it possible to define an index on a value field? (i.e. not from the key) - will it work exactly like defining one form the key or are three any limitations? (again, I am looking to overcome the situation, where as it seems, the records are loaded unnecessarily from disk)

Yes, indexes can be defined on fields in the value.  It will work the same.

If you are sure you are already using an index in the query and still loading every value for every execution of that query, there may be something weird going on...

On Sun, Aug 27, 2017 at 2:55 AM Roi Apelker <Ro...@amdocs.com> wrote:
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>

Re: Indxes and hints

Posted by Jason Huynh <jh...@pivotal.io>.

Hi Roi,

Answers are below the questions...

Question 1. Is it true to say, that the query as it is will load all the
data values from the file, since the field C is part of the value, which is
already persisted to file?

Depending on if an index is used or not, if an index is used, the values
that are part of the results will need to be loaded to actually return a
result.  If an index is not used, then the all the values would need to be
loaded to actually have something to evaluate the filter criteria on.

Question 2. If I add a hint on A and B, will it mean that there will be a
"2 phase search", first the select on A and B, and then, only on the
results, on the field C? (this way, not all records will be loaded from
file, only those that suit the A and B condition)

Depending on the  query, it could use one, or more.  If it's a query with
only AND clauses, it should just choose one and then evaluate the other
filters on the subset that is returned from the index.

Question 3. Is it possible to define an index on a value field? (i.e. not
from the key) - will it work exactly like defining one form the key or are
three any limitations? (again, I am looking to overcome the situation,
where as it seems, the records are loaded unnecessarily from disk)

Yes, indexes can be defined on fields in the value.  It will work the same.


If you are sure you are already using an index in the query and still
loading every value for every execution of that query, there may be
something weird going on...

On Sun, Aug 27, 2017 at 2:55 AM Roi Apelker <Ro...@amdocs.com> wrote:

> Hi,
>
> I have a few questions regarding indexes and hints, if someone could
> confirm the below it would be great:
>
> - I have a situation where I use 3 field values in a select (something
> like select where A>1, B>1, C=true)
> - A and B are fields on the key, and C is a field on the value.
> - A and B are indexes
> - I am looking for the most efficient way to execute the query above, in
> the situation where there is overflow to eviction files, meaning some of
> the data has already been evicted to a file, which slows down the select
> considerably (this is not persistence, but overflow).
>
>
> 1. Is it true to say, that the query as it is will load all the data
> values from the file, since the field C is part of the value, which is
> already persisted to file?
> 2. If I add a hint on A and B, will it mean that there will be a "2 phase
> search", first the select on A and B, and then, only on the results, on the
> field C? (this way, not all records will be loaded from file, only those
> that suit the A and B condition)
> 3. Is it possible to define an index on a value field? (i.e. not from the
> key) - will it work exactly like defining one form the key or are three any
> limitations? (again, I am looking to overcome the situation, where as it
> seems, the records are loaded unnecessarily from disk)
>
> Thank you,
>
> Roi
>
>
> -----Original Message-----
> From: Roi Apelker
> Sent: Thursday, August 24, 2017 7:03 PM
> To: dev@geode.apache.org
> Subject: eviction files
>
> Hi,
>
> I am looking into the internals of the eviction process,
>
> Can anyone point me to the most important classes, the main mechanism
> "wheels" etc.?
>
> Thanks,
>
> Roi
>
> -----Original Message-----
> From: Roi Apelker
> Sent: Wednesday, August 16, 2017 8:38 PM
> To: dev@geode.apache.org
> Subject: RE: continuous query internal mechanism questions
>
> It seems like the code in the native client (in the version I have, which
> may be old) send the message to all servers:
>
> CqResultsPtr CqQueryImpl::executeWithInitialResults(uint32_t timeout) {
>   .......
>
>   TcrMessage msg(TcrMessage::EXECUTECQ_WITH_IR_MSG_TYPE, m_cqName,
> m_queryString, CqState::RUNNING, isDurable(), m_tccdm);
>   TcrMessage reply(true, m_tccdm);
>   ChunkedQueryResponse* resultCollector = (new
> ChunkedQueryResponse(reply));
>   reply.setChunkedResultHandler(static_cast<TcrChunkedResult
> *>(resultCollector));
>   reply.setTimeout(timeout);
>
>   GfErrType err = GF_NOERR;
>   err = m_tccdm->sendSyncRequest(msg, reply); ..........
>
> And sendSyncRequest:
> ...
>
> for (std::vector<TcrEndpoint*>::iterator ep = m_endpoints.begin(); ep !=
> m_endpoints.end(); ++ep) {
>     if ((*ep)->connected()) {
>       (*ep)->setDM(this);
>       opErr = sendRequestToEP(request, reply, *ep);//this will go to
> ThinClientDistributionManager
>
> ...
>
>
> Can this be causing the issue?
>
>
>
> -Roi
>
>
>
>
>
> -----Original Message-----
> From: Jason Huynh [mailto:jasonhuynh@apache.org]
> Sent: Tuesday, August 15, 2017 9:25 PM
> To: dev@geode.apache.org
> Subject: Re: continuous query internal mechanism questions
>
> I am not quite sure how native client registers cqs. From my understanding:
> with the java api, I believe there is only one message (ExecuteCQ message)
> that is executed on the server side and then replicated to the other nodes
> through the profile (OperationMessage).
>
> It seems the extra ExecuteCQ message failing and then closing the cq might
> be putting the system in a weird state...
>
> On Tue, Aug 15, 2017 at 7:56 AM Roi Apelker <Ro...@amdocs.com>
> wrote:
>
> > Hi,
> >
> > I have been examining the continuous query registration mechanism for
> > quite some time This is related to an issue that I have, where
> > sometimes a node crashes (1 node out of 2), and the other one does not
> > send CQ events. The CQ is registered on a partitioned region which
> > resides on these 2 nodes.
> >
> > I noticed the following behavior, and I wonder if anyone can comment
> > regarding it, if it is justified or not and what is the reason:
> >
> > 1. When the software using the client (native client) registers for
> > the CQ, a CQ command (ExecuteCQ61) is received on both servers.
> >  -- is this normal behaviour? Does the client actually send this
> > command to both servers?
> >
> > 2. When this command is received by a server, and the CQ is
> > registered, another registration message is sent to the other node via
> > an OperationMessage (REGISTER_CQ)
> >  -- it seems that regularly, the server can handle this situation as
> > the second registration identifies the previous one and does not
> > affect it. but the question, why do we need this 2nd registration, if
> > there is a command sent to each server?
> >
> > 3. For some reason, sometimes there is a failure to complete the first
> > registration (executed by ExecuteCQ61) and then this failure causes a
> > closure to the CQ, which is accompanied with a close request to the
> > other node.
> >  -- I assume by now, since 2 registrations and one closure have
> > occurred on node 2, the CQ is still active and the client receives
> notifications.
> >
> > 4. Sometimes, 1 out of 5, once node 1 crashes, I get a cleanup
> > operation, caused by the crash (via MemberCrashedEvent), and this also
> > closes the existing CQ, and in this case the CQ in node 2 does not
> > operate anymore and the client receives no notifications.
> >  -- fact is, that 4 out of 4 times, I do not get this cleanup by
> > MemberCrashedEvent (maybe due to some other error), and that the CQ
> > notifications are received normally.
> >
> > Can anyone clear things up for me? Any comment on any of the
> > statements above will be greatly appreciated.
> >
> > Thanks,
> >
> > Roi
> >
> >
> > -----Original Message-----
> > From: Roi Apelker
> > Sent: Wednesday, August 09, 2017 3:21 PM
> > To: dev@geode.apache.org
> > Subject: RE: continuous query internal mechanism
> >
> > Dhanyavad
> >
> > -----Original Message-----
> > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > Sent: Tuesday, August 08, 2017 9:55 PM
> > To: dev@geode.apache.org
> > Subject: Re: continuous query internal mechanism
> >
> > Registered events, i meant, are events generated for interest
> > registration "region.registerInterest(*)". And CqEvents are for CQs
> registered.
> >
> > -Anil.
> >
> >
> > On Tue, Aug 8, 2017 at 12:27 AM, Roi Apelker <Ro...@amdocs.com>
> > wrote:
> >
> > > Shukriya
> > >
> > > What is the difference between registered events and CQ events?
> > >
> > > -----Original Message-----
> > > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > > Sent: Monday, August 07, 2017 10:12 PM
> > > To: dev@geode.apache.org
> > > Subject: Re: continuous query internal mechanism
> > >
> > > CQ Processing on server side is same for all clients (Java, C++)...
> > >
> > > The subscription events are sent to client as ClientUpdateMessage,
> > > which holds information about registered events and CQ events. The
> > > client process this and updates/invokes the client side
> > > cache/listeners with respective event. Look into
> > > ClientUpdateMessageImpl and CacheClientUpdater (for client side
> > processing).
> > >
> > > -Anil.
> > >
> > >
> > >
> > >
> > > On Mon, Aug 7, 2017 at 11:01 AM, Roi Apelker
> > > <Ro...@amdocs.com>
> > > wrote:
> > >
> > > > Thanks,
> > > >
> > > > By the way, is there any difference in the behaviour of the
> > > > server, if the client that registered the CQ is a native (C++)
> client?
> > > >
> > > > I have been going over the classes and code for some time and
> > > > can't seem to find the actual location where a CQ
> > > > update/notification is
> > > sent...
> > > >
> > > > It's like CqEventImpl class is never even generated in this scenario.
> > > >
> > > > If anyone can help here I would be most grateful :-)
> > > >
> > > > Thanks
> > > >
> > > > Roi
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > > > Sent: Monday, August 07, 2017 8:23 PM
> > > > To: dev@geode.apache.org
> > > > Subject: Re: continuous query internal mechanism
> > > >
> > > > You can find those in CqServiceImpl.process*()...
> > > >
> > > > -Anil.
> > > >
> > > >
> > > > On Mon, Aug 7, 2017 at 9:14 AM, Roi Apelker
> > > > <Ro...@amdocs.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am trying to look into the code of the continuous query
> > > > > mechanism
> > > > > - where the GEODE server sends the notification back to the client.
> > > > >
> > > > > Can anyone point me to the central classes of continuous query,
> > > > > especially to the one that is responsible for the calculation of
> > > > > the new data and packing it as a message back to the client?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Roi
> > > > >
> > > > > This message and the information contained herein is proprietary
> > > > > and confidential and subject to the Amdocs policy statement,
> > > > >
> > > > > you may review at https://www.amdocs.com/about/email-disclaimer
> > > > > < https://www.amdocs.com/about/email-disclaimer>
> > > > >
> > > > This message and the information contained herein is proprietary
> > > > and confidential and subject to the Amdocs policy statement,
> > > >
> > > > you may review at https://www.amdocs.com/about/email-disclaimer <
> > > > https://www.amdocs.com/about/email-disclaimer>
> > > >
> > > This message and the information contained herein is proprietary and
> > > confidential and subject to the Amdocs policy statement,
> > >
> > > you may review at https://www.amdocs.com/about/email-disclaimer <
> > > https://www.amdocs.com/about/email-disclaimer>
> > >
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> >
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>