You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Roi Apelker <Ro...@amdocs.com> on 2017/08/28 13:04:04 UTC

Query mechanism

Hi,

I am looking into the internals of how the query process works, and how indexes/hints affect it,

Can anyone point me to the most important classes, the main mechanism "wheels" etc.?

Thanks,

Roi


-----Original Message-----
From: Roi Apelker 
Sent: Sunday, August 27, 2017 12:55 PM
To: dev@geode.apache.org
Subject: Indxes and hints

Hi,

I have a few questions regarding indexes and hints, if someone could confirm the below it would be great:

- I have a situation where I use 3 field values in a select (something like select where A>1, B>1, C=true)
- A and B are fields on the key, and C is a field on the value.
- A and B are indexes
- I am looking for the most efficient way to execute the query above, in the situation where there is overflow to eviction files, meaning some of the data has already been evicted to a file, which slows down the select considerably (this is not persistence, but overflow).


1. Is it true to say, that the query as it is will load all the data values from the file, since the field C is part of the value, which is already persisted to file?
2. If I add a hint on A and B, will it mean that there will be a "2 phase search", first the select on A and B, and then, only on the results, on the field C? (this way, not all records will be loaded from file, only those that suit the A and B condition) 3. Is it possible to define an index on a value field? (i.e. not from the key) - will it work exactly like defining one form the key or are three any limitations? (again, I am looking to overcome the situation, where as it seems, the records are loaded unnecessarily from disk)

Thank you,

Roi


-----Original Message-----
From: Roi Apelker
Sent: Thursday, August 24, 2017 7:03 PM
To: dev@geode.apache.org
Subject: eviction files

Hi,

I am looking into the internals of the eviction process,

Can anyone point me to the most important classes, the main mechanism "wheels" etc.?

Thanks,

Roi

-----Original Message-----
From: Roi Apelker
Sent: Wednesday, August 16, 2017 8:38 PM
To: dev@geode.apache.org
Subject: RE: continuous query internal mechanism questions

It seems like the code in the native client (in the version I have, which may be old) send the message to all servers:

CqResultsPtr CqQueryImpl::executeWithInitialResults(uint32_t timeout) {
  .......

  TcrMessage msg(TcrMessage::EXECUTECQ_WITH_IR_MSG_TYPE, m_cqName, m_queryString, CqState::RUNNING, isDurable(), m_tccdm);
  TcrMessage reply(true, m_tccdm);
  ChunkedQueryResponse* resultCollector = (new ChunkedQueryResponse(reply));
  reply.setChunkedResultHandler(static_cast<TcrChunkedResult *>(resultCollector));
  reply.setTimeout(timeout);

  GfErrType err = GF_NOERR;
  err = m_tccdm->sendSyncRequest(msg, reply); ..........

And sendSyncRequest:
...

for (std::vector<TcrEndpoint*>::iterator ep = m_endpoints.begin(); ep != m_endpoints.end(); ++ep) {
    if ((*ep)->connected()) {
      (*ep)->setDM(this);
      opErr = sendRequestToEP(request, reply, *ep);//this will go to ThinClientDistributionManager

...


Can this be causing the issue?



-Roi





-----Original Message-----
From: Jason Huynh [mailto:jasonhuynh@apache.org]
Sent: Tuesday, August 15, 2017 9:25 PM
To: dev@geode.apache.org
Subject: Re: continuous query internal mechanism questions

I am not quite sure how native client registers cqs. From my understanding:
with the java api, I believe there is only one message (ExecuteCQ message) that is executed on the server side and then replicated to the other nodes through the profile (OperationMessage).

It seems the extra ExecuteCQ message failing and then closing the cq might be putting the system in a weird state...

On Tue, Aug 15, 2017 at 7:56 AM Roi Apelker <Ro...@amdocs.com> wrote:

> Hi,
>
> I have been examining the continuous query registration mechanism for 
> quite some time This is related to an issue that I have, where 
> sometimes a node crashes (1 node out of 2), and the other one does not 
> send CQ events. The CQ is registered on a partitioned region which 
> resides on these 2 nodes.
>
> I noticed the following behavior, and I wonder if anyone can comment 
> regarding it, if it is justified or not and what is the reason:
>
> 1. When the software using the client (native client) registers for 
> the CQ, a CQ command (ExecuteCQ61) is received on both servers.
>  -- is this normal behaviour? Does the client actually send this 
> command to both servers?
>
> 2. When this command is received by a server, and the CQ is 
> registered, another registration message is sent to the other node via 
> an OperationMessage (REGISTER_CQ)
>  -- it seems that regularly, the server can handle this situation as 
> the second registration identifies the previous one and does not 
> affect it. but the question, why do we need this 2nd registration, if 
> there is a command sent to each server?
>
> 3. For some reason, sometimes there is a failure to complete the first 
> registration (executed by ExecuteCQ61) and then this failure causes a 
> closure to the CQ, which is accompanied with a close request to the 
> other node.
>  -- I assume by now, since 2 registrations and one closure have 
> occurred on node 2, the CQ is still active and the client receives notifications.
>
> 4. Sometimes, 1 out of 5, once node 1 crashes, I get a cleanup 
> operation, caused by the crash (via MemberCrashedEvent), and this also 
> closes the existing CQ, and in this case the CQ in node 2 does not 
> operate anymore and the client receives no notifications.
>  -- fact is, that 4 out of 4 times, I do not get this cleanup by 
> MemberCrashedEvent (maybe due to some other error), and that the CQ 
> notifications are received normally.
>
> Can anyone clear things up for me? Any comment on any of the 
> statements above will be greatly appreciated.
>
> Thanks,
>
> Roi
>
>
> -----Original Message-----
> From: Roi Apelker
> Sent: Wednesday, August 09, 2017 3:21 PM
> To: dev@geode.apache.org
> Subject: RE: continuous query internal mechanism
>
> Dhanyavad
>
> -----Original Message-----
> From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> Sent: Tuesday, August 08, 2017 9:55 PM
> To: dev@geode.apache.org
> Subject: Re: continuous query internal mechanism
>
> Registered events, i meant, are events generated for interest 
> registration "region.registerInterest(*)". And CqEvents are for CQs registered.
>
> -Anil.
>
>
> On Tue, Aug 8, 2017 at 12:27 AM, Roi Apelker <Ro...@amdocs.com>
> wrote:
>
> > Shukriya
> >
> > What is the difference between registered events and CQ events?
> >
> > -----Original Message-----
> > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > Sent: Monday, August 07, 2017 10:12 PM
> > To: dev@geode.apache.org
> > Subject: Re: continuous query internal mechanism
> >
> > CQ Processing on server side is same for all clients (Java, C++)...
> >
> > The subscription events are sent to client as ClientUpdateMessage, 
> > which holds information about registered events and CQ events. The 
> > client process this and updates/invokes the client side 
> > cache/listeners with respective event. Look into 
> > ClientUpdateMessageImpl and CacheClientUpdater (for client side
> processing).
> >
> > -Anil.
> >
> >
> >
> >
> > On Mon, Aug 7, 2017 at 11:01 AM, Roi Apelker 
> > <Ro...@amdocs.com>
> > wrote:
> >
> > > Thanks,
> > >
> > > By the way, is there any difference in the behaviour of the 
> > > server, if the client that registered the CQ is a native (C++) client?
> > >
> > > I have been going over the classes and code for some time and 
> > > can't seem to find the actual location where a CQ 
> > > update/notification is
> > sent...
> > >
> > > It's like CqEventImpl class is never even generated in this scenario.
> > >
> > > If anyone can help here I would be most grateful :-)
> > >
> > > Thanks
> > >
> > > Roi
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > > Sent: Monday, August 07, 2017 8:23 PM
> > > To: dev@geode.apache.org
> > > Subject: Re: continuous query internal mechanism
> > >
> > > You can find those in CqServiceImpl.process*()...
> > >
> > > -Anil.
> > >
> > >
> > > On Mon, Aug 7, 2017 at 9:14 AM, Roi Apelker 
> > > <Ro...@amdocs.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am trying to look into the code of the continuous query 
> > > > mechanism
> > > > - where the GEODE server sends the notification back to the client.
> > > >
> > > > Can anyone point me to the central classes of continuous query, 
> > > > especially to the one that is responsible for the calculation of 
> > > > the new data and packing it as a message back to the client?
> > > >
> > > > Thanks,
> > > >
> > > > Roi
> > > >
> > > > This message and the information contained herein is proprietary 
> > > > and confidential and subject to the Amdocs policy statement,
> > > >
> > > > you may review at https://www.amdocs.com/about/email-disclaimer
> > > > < https://www.amdocs.com/about/email-disclaimer>
> > > >
> > > This message and the information contained herein is proprietary 
> > > and confidential and subject to the Amdocs policy statement,
> > >
> > > you may review at https://www.amdocs.com/about/email-disclaimer < 
> > > https://www.amdocs.com/about/email-disclaimer>
> > >
> > This message and the information contained herein is proprietary and 
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer < 
> > https://www.amdocs.com/about/email-disclaimer>
> >
> This message and the information contained herein is proprietary and 
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer < 
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and 
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer < 
> https://www.amdocs.com/about/email-disclaimer>
>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>

Re: Query mechanism

Posted by Anilkumar Gingade <ag...@pivotal.io>.
Roi,

There is no documentation on the internals of Query engine...I believe
there are tools which can generate package/class diagrams and their
relationship...Based on the problem/issue, we could point to the source
file, where you can find more info.

To know how query is picking the best index; you can look at:
IndexManager.getBestMatchIndex()

-Anil.







On Thu, Sep 7, 2017 at 8:14 AM, Roi Apelker <Ro...@amdocs.com> wrote:

> Hi,
>
> Is there a design document which explains the query mechanism? Main
> classes, modules, main algorithm?
>
> I have been looking into this area (since I have a performance issue,
> where it seems that data is loaded from disk prematurely in eviction state,
> which causes the system to become really slow),
>
> I have found a few scenarios where things do not work as it seems they
> should, for example using 2 hints doesn't work, using 1 hint may give
> slower results, indexes that are not used (at least according to the trace)
> - it's something larger than a specific issue, and while going through the
> code I sometimes fail to understand what was meant to be and why in several
> locations.
>
> Any help here would be gladly appreciated,
>
> Thank you,
>
> Roi
>
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>

Query mechanism

Posted by Roi Apelker <Ro...@amdocs.com>.
Hi,

Is there a design document which explains the query mechanism? Main classes, modules, main algorithm?

I have been looking into this area (since I have a performance issue, where it seems that data is loaded from disk prematurely in eviction state, which causes the system to become really slow),

I have found a few scenarios where things do not work as it seems they should, for example using 2 hints doesn't work, using 1 hint may give slower results, indexes that are not used (at least according to the trace) - it's something larger than a specific issue, and while going through the code I sometimes fail to understand what was meant to be and why in several locations.

Any help here would be gladly appreciated,

Thank you,

Roi


This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>

Re: Query mechanism

Posted by Jason Huynh <jh...@pivotal.io>.
DefaultQuery is where the processing starts for a query.

CompiledSelect will most likely be the first node in processing the query.

The IndexManager class will contain the list of indexes for a region as
well as the methods that help find indexes to use with a query.

Specific index classes:
CompactRangeIndex
RangeIndex
HashIndex
PrimaryKeyIndex
MapRangeIndex
CompactMapRangeIndex



On Mon, Aug 28, 2017 at 6:04 AM Roi Apelker <Ro...@amdocs.com> wrote:

>
> Hi,
>
> I am looking into the internals of how the query process works, and how
> indexes/hints affect it,
>
> Can anyone point me to the most important classes, the main mechanism
> "wheels" etc.?
>
> Thanks,
>
> Roi
>
>
> -----Original Message-----
> From: Roi Apelker
> Sent: Sunday, August 27, 2017 12:55 PM
> To: dev@geode.apache.org
> Subject: Indxes and hints
>
> Hi,
>
> I have a few questions regarding indexes and hints, if someone could
> confirm the below it would be great:
>
> - I have a situation where I use 3 field values in a select (something
> like select where A>1, B>1, C=true)
> - A and B are fields on the key, and C is a field on the value.
> - A and B are indexes
> - I am looking for the most efficient way to execute the query above, in
> the situation where there is overflow to eviction files, meaning some of
> the data has already been evicted to a file, which slows down the select
> considerably (this is not persistence, but overflow).
>
>
> 1. Is it true to say, that the query as it is will load all the data
> values from the file, since the field C is part of the value, which is
> already persisted to file?
> 2. If I add a hint on A and B, will it mean that there will be a "2 phase
> search", first the select on A and B, and then, only on the results, on the
> field C? (this way, not all records will be loaded from file, only those
> that suit the A and B condition) 3. Is it possible to define an index on a
> value field? (i.e. not from the key) - will it work exactly like defining
> one form the key or are three any limitations? (again, I am looking to
> overcome the situation, where as it seems, the records are loaded
> unnecessarily from disk)
>
> Thank you,
>
> Roi
>
>
> -----Original Message-----
> From: Roi Apelker
> Sent: Thursday, August 24, 2017 7:03 PM
> To: dev@geode.apache.org
> Subject: eviction files
>
> Hi,
>
> I am looking into the internals of the eviction process,
>
> Can anyone point me to the most important classes, the main mechanism
> "wheels" etc.?
>
> Thanks,
>
> Roi
>
> -----Original Message-----
> From: Roi Apelker
> Sent: Wednesday, August 16, 2017 8:38 PM
> To: dev@geode.apache.org
> Subject: RE: continuous query internal mechanism questions
>
> It seems like the code in the native client (in the version I have, which
> may be old) send the message to all servers:
>
> CqResultsPtr CqQueryImpl::executeWithInitialResults(uint32_t timeout) {
>   .......
>
>   TcrMessage msg(TcrMessage::EXECUTECQ_WITH_IR_MSG_TYPE, m_cqName,
> m_queryString, CqState::RUNNING, isDurable(), m_tccdm);
>   TcrMessage reply(true, m_tccdm);
>   ChunkedQueryResponse* resultCollector = (new
> ChunkedQueryResponse(reply));
>   reply.setChunkedResultHandler(static_cast<TcrChunkedResult
> *>(resultCollector));
>   reply.setTimeout(timeout);
>
>   GfErrType err = GF_NOERR;
>   err = m_tccdm->sendSyncRequest(msg, reply); ..........
>
> And sendSyncRequest:
> ...
>
> for (std::vector<TcrEndpoint*>::iterator ep = m_endpoints.begin(); ep !=
> m_endpoints.end(); ++ep) {
>     if ((*ep)->connected()) {
>       (*ep)->setDM(this);
>       opErr = sendRequestToEP(request, reply, *ep);//this will go to
> ThinClientDistributionManager
>
> ...
>
>
> Can this be causing the issue?
>
>
>
> -Roi
>
>
>
>
>
> -----Original Message-----
> From: Jason Huynh [mailto:jasonhuynh@apache.org]
> Sent: Tuesday, August 15, 2017 9:25 PM
> To: dev@geode.apache.org
> Subject: Re: continuous query internal mechanism questions
>
> I am not quite sure how native client registers cqs. From my understanding:
> with the java api, I believe there is only one message (ExecuteCQ message)
> that is executed on the server side and then replicated to the other nodes
> through the profile (OperationMessage).
>
> It seems the extra ExecuteCQ message failing and then closing the cq might
> be putting the system in a weird state...
>
> On Tue, Aug 15, 2017 at 7:56 AM Roi Apelker <Ro...@amdocs.com>
> wrote:
>
> > Hi,
> >
> > I have been examining the continuous query registration mechanism for
> > quite some time This is related to an issue that I have, where
> > sometimes a node crashes (1 node out of 2), and the other one does not
> > send CQ events. The CQ is registered on a partitioned region which
> > resides on these 2 nodes.
> >
> > I noticed the following behavior, and I wonder if anyone can comment
> > regarding it, if it is justified or not and what is the reason:
> >
> > 1. When the software using the client (native client) registers for
> > the CQ, a CQ command (ExecuteCQ61) is received on both servers.
> >  -- is this normal behaviour? Does the client actually send this
> > command to both servers?
> >
> > 2. When this command is received by a server, and the CQ is
> > registered, another registration message is sent to the other node via
> > an OperationMessage (REGISTER_CQ)
> >  -- it seems that regularly, the server can handle this situation as
> > the second registration identifies the previous one and does not
> > affect it. but the question, why do we need this 2nd registration, if
> > there is a command sent to each server?
> >
> > 3. For some reason, sometimes there is a failure to complete the first
> > registration (executed by ExecuteCQ61) and then this failure causes a
> > closure to the CQ, which is accompanied with a close request to the
> > other node.
> >  -- I assume by now, since 2 registrations and one closure have
> > occurred on node 2, the CQ is still active and the client receives
> notifications.
> >
> > 4. Sometimes, 1 out of 5, once node 1 crashes, I get a cleanup
> > operation, caused by the crash (via MemberCrashedEvent), and this also
> > closes the existing CQ, and in this case the CQ in node 2 does not
> > operate anymore and the client receives no notifications.
> >  -- fact is, that 4 out of 4 times, I do not get this cleanup by
> > MemberCrashedEvent (maybe due to some other error), and that the CQ
> > notifications are received normally.
> >
> > Can anyone clear things up for me? Any comment on any of the
> > statements above will be greatly appreciated.
> >
> > Thanks,
> >
> > Roi
> >
> >
> > -----Original Message-----
> > From: Roi Apelker
> > Sent: Wednesday, August 09, 2017 3:21 PM
> > To: dev@geode.apache.org
> > Subject: RE: continuous query internal mechanism
> >
> > Dhanyavad
> >
> > -----Original Message-----
> > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > Sent: Tuesday, August 08, 2017 9:55 PM
> > To: dev@geode.apache.org
> > Subject: Re: continuous query internal mechanism
> >
> > Registered events, i meant, are events generated for interest
> > registration "region.registerInterest(*)". And CqEvents are for CQs
> registered.
> >
> > -Anil.
> >
> >
> > On Tue, Aug 8, 2017 at 12:27 AM, Roi Apelker <Ro...@amdocs.com>
> > wrote:
> >
> > > Shukriya
> > >
> > > What is the difference between registered events and CQ events?
> > >
> > > -----Original Message-----
> > > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > > Sent: Monday, August 07, 2017 10:12 PM
> > > To: dev@geode.apache.org
> > > Subject: Re: continuous query internal mechanism
> > >
> > > CQ Processing on server side is same for all clients (Java, C++)...
> > >
> > > The subscription events are sent to client as ClientUpdateMessage,
> > > which holds information about registered events and CQ events. The
> > > client process this and updates/invokes the client side
> > > cache/listeners with respective event. Look into
> > > ClientUpdateMessageImpl and CacheClientUpdater (for client side
> > processing).
> > >
> > > -Anil.
> > >
> > >
> > >
> > >
> > > On Mon, Aug 7, 2017 at 11:01 AM, Roi Apelker
> > > <Ro...@amdocs.com>
> > > wrote:
> > >
> > > > Thanks,
> > > >
> > > > By the way, is there any difference in the behaviour of the
> > > > server, if the client that registered the CQ is a native (C++)
> client?
> > > >
> > > > I have been going over the classes and code for some time and
> > > > can't seem to find the actual location where a CQ
> > > > update/notification is
> > > sent...
> > > >
> > > > It's like CqEventImpl class is never even generated in this scenario.
> > > >
> > > > If anyone can help here I would be most grateful :-)
> > > >
> > > > Thanks
> > > >
> > > > Roi
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > > > Sent: Monday, August 07, 2017 8:23 PM
> > > > To: dev@geode.apache.org
> > > > Subject: Re: continuous query internal mechanism
> > > >
> > > > You can find those in CqServiceImpl.process*()...
> > > >
> > > > -Anil.
> > > >
> > > >
> > > > On Mon, Aug 7, 2017 at 9:14 AM, Roi Apelker
> > > > <Ro...@amdocs.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am trying to look into the code of the continuous query
> > > > > mechanism
> > > > > - where the GEODE server sends the notification back to the client.
> > > > >
> > > > > Can anyone point me to the central classes of continuous query,
> > > > > especially to the one that is responsible for the calculation of
> > > > > the new data and packing it as a message back to the client?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Roi
> > > > >
> > > > > This message and the information contained herein is proprietary
> > > > > and confidential and subject to the Amdocs policy statement,
> > > > >
> > > > > you may review at https://www.amdocs.com/about/email-disclaimer
> > > > > < https://www.amdocs.com/about/email-disclaimer>
> > > > >
> > > > This message and the information contained herein is proprietary
> > > > and confidential and subject to the Amdocs policy statement,
> > > >
> > > > you may review at https://www.amdocs.com/about/email-disclaimer <
> > > > https://www.amdocs.com/about/email-disclaimer>
> > > >
> > > This message and the information contained herein is proprietary and
> > > confidential and subject to the Amdocs policy statement,
> > >
> > > you may review at https://www.amdocs.com/about/email-disclaimer <
> > > https://www.amdocs.com/about/email-disclaimer>
> > >
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> >
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>