You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@geode.apache.org by Matt Ross <mr...@pivotal.io> on 2016/04/15 22:36:10 UTC

Best Practices for Calling Server Side Functions

Hi all,

I'm involved in a sizable GemFire Project right now that is requiring me to
execute Functions in a number of ways, and I wanted to poll the community
for some best practices.  So initially I would execute all functions like
this.

ResultCollector<?, ?> rc = FunctionService.onRegion(region)
    .withArgs(arguments).execute("my-awesome-function");

And this worked reliably for quite some time, until I started mixing
up functions that were executing on partition redundant data and
replicated data.  I initially started having problems with this method
when I had this setup.

1 locator, 2 servers,  and executing functions that would run queries
on partition redundant and replicated regions.  I started getting this
problem where the function would execute on both servers, and the
result collector would indeterminately chose a server to return
results from.  According to logging statements placed within my
function I was able to confirm that the function was being executed
twice, on both servers.  We were able to fix this problem by switching
from executing on region, to executing on Pool.  The initial logic
being since there was replicated data on both servers, the function
would execute on both servers(Hyptothesis).

Another issue was executing functions from within a function without a
function context.  Let's say I have one function that I execute with
on Pool, there for it is passed a Function Context.  But when I'm
actually in the function I need to execute other functions, some
needing a RegionFunctionContext and some just needing a
FunctionContext.  Initially I was able to just use a Result Collector
and FunctionService.onRegion to get a region context, and then pass my
current function context to an instance of a new function

MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();

myAweSomeFunction.execute(functionContext);

This worked for a time but complexity started rising and more problems
came up.

So in short I wanted to throw out the blanket question of best
practices on using (onRegion/onPool/onServer), calling other functions
from within functions, what type of functions should be used on what
type of regions, and general design patterns when executing functions.
Thanks!

*Matthew Ross | Data Engineer | Pivotal*
*625 Avenue of the Americas NY, NY 10011*
*516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *

Re: Best Practices for Calling Server Side Functions

Posted by Anthony Baker <ab...@pivotal.io>.

IIRC I’ve done onRegion() with replicated regions before.  Only weird thing is for peers that host the region the execution is done on the current thread rather than switching onto the function thread pool.

Anthony

> On Apr 18, 2016, at 10:11 AM, Barry Oglesby <bo...@pivotal.io> wrote:
> 
> Interesting. My simple replicate test is running just fine. What exception are you seeing?
> 
> Thanks,
> Barry Oglesby
> 
> 
> On Mon, Apr 18, 2016 at 7:29 AM, Mark Secrist <msecrist@pivotal.io <ma...@pivotal.io>> wrote:
> Great detailed explanation Barry. At on point, you made this statement:
> 
> "If you're executing a function onRegion with a replicated region, then the function is executed on any member defining that region. Since the region is replicated, every server has the same data."
> 
> Has something changed with Geode recently? Last I checked you can't execute onRegion on a Replicated region. You'll actually get an exception thrown.
> 
> On Fri, Apr 15, 2016 at 5:53 PM, Barry Oglesby <boglesby@pivotal.io <ma...@pivotal.io>> wrote:
> Executing queries in functions can be tricky.
> 
> For executing queries in a function, do something like:
> 
> - invoke the function with onRegion
> - have the function return true from optimizeForWrite so that it is executed only on primary buckets
> - use the Query execute API with a RegionFunctionContext in the function. Otherwise, you could easily end up executing the same query on more than one member.
> 
> If you set a filter, the function (and query) will execute on only the member containing the primary or primaries for that filter.
> 
> Here is an example with trades.
> 
> If you route all trades on a specific cusip to the same bucket using a PartitionResolver, then querying for all trades for a specific cusip can be done efficiently using a Function. The trades could be stored with a simple String key like cusip-id or a complex key containing both the cusip and id. Either way, the PartitionResolver will need to be able to return the cusip for the routing object.
> 
> Invoke the function like:
> 
> Execution execution = FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute("TradeQueryFunction");
> Object result = collector.getResult();
> 
> In the TradeQueryFunction, execute the query like:
> 
> RegionFunctionContext rfc = (RegionFunctionContext) context;
> String cusip = (String) rfc.getFilter().iterator().next();
> SelectResults results = (SelectResults) this.query.execute(rfc, new String[] {cusip});
> 
> Where the query is:
> 
> select * from /Trade where cusip = $1
> 
> This will route the function request to the member whose primary bucket contains the cusip filter. Then it will execute the query on the RegionFunctionContext which will just be the data for that bucket. Note: the PartitionResolver will also need to be able to return the cusip for that filter (which is just the input string itself).
> 
> Here is a some more general info on functions.
> 
> If you're executing a function onRegion with a replicated region, then the function is executed on any member defining that region. Since the region is replicated, every server has the same data.
> 
> If you're executing a function onRegion with a partitioned region, then where the function is invoked depends on the result of optimizeForWrite. If optimizeForWrite returns true, the function is invoked on all the members containing primary buckets for that region. If optimizeForWrite returns false, the function is invoked on as few members as it can that encompass all the buckets (so it mixes primary and secondary buckets). For example if you have 2 members, and the primaries are split between them, then optimizeForWrite returning true means that the function will be invoked on both members. Returning false will cause the function to be invoked on only one member since each member has all the buckets. I almost always have optimizeForWrite return true.
> 
> The onServer/onServers API is used for data-unaware calls (meaning no specific region involved). In the past, I've used it mainly for admin-type behavior like:
> 
> - start/stop gateway senders
> - create regions
> - rebalance
> - assign buckets
> 
> Now, gfsh does a lot of this behavior (maybe all of it), so I don't necessarily need functions to do it anymore.
> 
> One of my favorite onServer use cases is the command pattern using a Request/Response API like:
> 
> - define a Request (like RebalanceCache)-
> - pass it as an argument to a CommandFunction from the client to a server using onServer
> - execute it on the server
> - return a Response
> 
> One use case for invoking a function from another function is member notification. This can be done with a CacheListener on a replicated region too, but the basic idea is:
> 
> - invoke a function
> - in the function, invoke another function on all the members notifying them something is about to happen
> - do the thing
> - invoke another function on all the members notifying them something has happened
> 
> You need to be careful when invoking one function from another. Depending on what you're doing in the second function, you could get yourself into a distributed deadlock situation.
> 
> I'm not sure this answers all the issues you were seeing, but hopefully it helps.
> 
> Thanks,
> Barry Oglesby
> 
> 
> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mross@pivotal.io <ma...@pivotal.io>> wrote:
> Hi all,
> 
> I'm involved in a sizable GemFire Project right now that is requiring me to execute Functions in a number of ways, and I wanted to poll the community for some best practices.  So initially I would execute all functions like this.
> 
> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>     .withArgs(arguments).execute("my-awesome-function");
> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
> myAweSomeFunction.execute(functionContext);
> This worked for a time but complexity started rising and more problems came up.
> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
> Matthew Ross | Data Engineer | Pivotal
> 625 Avenue of the Americas NY, NY 10011
> 516-941-7535 <tel:516-941-7535> | mross@pivotal.io <ma...@pivotal.io>
> 
> 
> 
> 
> 
> --
> Mark Secrist | Sr Manager, Global Education Delivery
> 
> msecrist@pivotal.io <ma...@pivotal.io>
> 970.214.4567 <> Mobile
> 
>   pivotal.io <http://www.pivotal.io/>
> Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn <http://www.linkedin.com/company/pivotalsoftware> | Facebook <http://www.facebook.com/pivotalsoftware> | YouTube <http://www.youtube.com/gopivotal> | Google+ <https://plus.google.com/105320112436428794490>

Re: Best Practices for Calling Server Side Functions

Posted by Barry Oglesby <bo...@pivotal.io>.

Interesting. My simple replicate test is running just fine. What exception
are you seeing?

Thanks,
Barry Oglesby


On Mon, Apr 18, 2016 at 7:29 AM, Mark Secrist <ms...@pivotal.io> wrote:

> Great detailed explanation Barry. At on point, you made this statement:
>
> "If you're executing a function onRegion with a replicated region, then
> the function is executed on any member defining that region. Since the
> region is replicated, every server has the same data."
>
> Has something changed with Geode recently? Last I checked you can't
> execute onRegion on a Replicated region. You'll actually get an exception
> thrown.
>
> On Fri, Apr 15, 2016 at 5:53 PM, Barry Oglesby <bo...@pivotal.io>
> wrote:
>
>> Executing queries in functions can be tricky.
>>
>> For executing queries in a function, do something like:
>>
>> - invoke the function with onRegion
>> - have the function return true from optimizeForWrite so that it is
>> executed only on primary buckets
>> - use the Query execute API with a RegionFunctionContext in the function.
>> Otherwise, you could easily end up executing the same query on more than
>> one member.
>>
>> If you set a filter, the function (and query) will execute on only the
>> member containing the primary or primaries for that filter.
>>
>> Here is an example with trades.
>>
>> If you route all trades on a specific cusip to the same bucket using a
>> PartitionResolver, then querying for all trades for a specific cusip can be
>> done efficiently using a Function. The trades could be stored with a simple
>> String key like cusip-id or a complex key containing both the cusip and id.
>> Either way, the PartitionResolver will need to be able to return the cusip
>> for the routing object.
>>
>> Invoke the function like:
>>
>> Execution execution =
>> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
>> ResultCollector collector = execution.execute("TradeQueryFunction");
>> Object result = collector.getResult();
>>
>> In the TradeQueryFunction, execute the query like:
>>
>> RegionFunctionContext rfc = (RegionFunctionContext) context;
>> String cusip = (String) rfc.getFilter().iterator().next();
>> SelectResults results = (SelectResults) this.query.execute(rfc, new
>> String[] {cusip});
>>
>> Where the query is:
>>
>> select * from /Trade where cusip = $1
>>
>> This will route the function request to the member whose primary bucket
>> contains the cusip filter. Then it will execute the query on the
>> RegionFunctionContext which will just be the data for that bucket. Note:
>> the PartitionResolver will also need to be able to return the cusip for
>> that filter (which is just the input string itself).
>>
>> Here is a some more general info on functions.
>>
>> If you're executing a function onRegion with a replicated region, then
>> the function is executed on any member defining that region. Since the
>> region is replicated, every server has the same data.
>>
>> If you're executing a function onRegion with a partitioned region, then
>> where the function is invoked depends on the result of optimizeForWrite. If
>> optimizeForWrite returns true, the function is invoked on all the members
>> containing primary buckets for that region. If optimizeForWrite returns
>> false, the function is invoked on as few members as it can that encompass
>> all the buckets (so it mixes primary and secondary buckets). For example if
>> you have 2 members, and the primaries are split between them, then
>> optimizeForWrite returning true means that the function will be invoked on
>> both members. Returning false will cause the function to be invoked on only
>> one member since each member has all the buckets. I almost always have
>> optimizeForWrite return true.
>>
>> The onServer/onServers API is used for data-unaware calls (meaning no
>> specific region involved). In the past, I've used it mainly for admin-type
>> behavior like:
>>
>> - start/stop gateway senders
>> - create regions
>> - rebalance
>> - assign buckets
>>
>> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
>> necessarily need functions to do it anymore.
>>
>> One of my favorite onServer use cases is the command pattern using a
>> Request/Response API like:
>>
>> - define a Request (like RebalanceCache)-
>> - pass it as an argument to a CommandFunction from the client to a server
>> using onServer
>> - execute it on the server
>> - return a Response
>>
>> One use case for invoking a function from another function is member
>> notification. This can be done with a CacheListener on a replicated region
>> too, but the basic idea is:
>>
>> - invoke a function
>> - in the function, invoke another function on all the members notifying
>> them something is about to happen
>> - do the thing
>> - invoke another function on all the members notifying them something has
>> happened
>>
>> You need to be careful when invoking one function from another. Depending
>> on what you're doing in the second function, you could get yourself into a
>> distributed deadlock situation.
>>
>> I'm not sure this answers all the issues you were seeing, but hopefully
>> it helps.
>>
>> Thanks,
>> Barry Oglesby
>>
>>
>> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mr...@pivotal.io> wrote:
>>
>>> Hi all,
>>>
>>> I'm involved in a sizable GemFire Project right now that is requiring me
>>> to execute Functions in a number of ways, and I wanted to poll the
>>> community for some best practices.  So initially I would execute all
>>> functions like this.
>>>
>>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>>     .withArgs(arguments).execute("my-awesome-function");
>>>
>>> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>>>
>>> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>>>
>>> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>>>
>>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>>
>>> myAweSomeFunction.execute(functionContext);
>>>
>>> This worked for a time but complexity started rising and more problems came up.
>>>
>>> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>>>
>>> *Matthew Ross | Data Engineer | Pivotal*
>>> *625 Avenue of the Americas NY, NY 10011*
>>> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>>>
>>>
>>
>
>
> --
>
> *Mark Secrist | Sr Manager, **Global Education Delivery*
>
> msecrist@pivotal.io
>
> 970.214.4567 Mobile
>
>   *pivotal.io <http://www.pivotal.io/>*
>
> Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn
> <http://www.linkedin.com/company/pivotalsoftware> | Facebook
> <http://www.facebook.com/pivotalsoftware> | YouTube
> <http://www.youtube.com/gopivotal> | Google+
> <https://plus.google.com/105320112436428794490>
>

Re: Best Practices for Calling Server Side Functions

Posted by Mark Secrist <ms...@pivotal.io>.

Great detailed explanation Barry. At on point, you made this statement:

"If you're executing a function onRegion with a replicated region, then the
function is executed on any member defining that region. Since the region
is replicated, every server has the same data."

Has something changed with Geode recently? Last I checked you can't execute
onRegion on a Replicated region. You'll actually get an exception thrown.

On Fri, Apr 15, 2016 at 5:53 PM, Barry Oglesby <bo...@pivotal.io> wrote:

> Executing queries in functions can be tricky.
>
> For executing queries in a function, do something like:
>
> - invoke the function with onRegion
> - have the function return true from optimizeForWrite so that it is
> executed only on primary buckets
> - use the Query execute API with a RegionFunctionContext in the function.
> Otherwise, you could easily end up executing the same query on more than
> one member.
>
> If you set a filter, the function (and query) will execute on only the
> member containing the primary or primaries for that filter.
>
> Here is an example with trades.
>
> If you route all trades on a specific cusip to the same bucket using a
> PartitionResolver, then querying for all trades for a specific cusip can be
> done efficiently using a Function. The trades could be stored with a simple
> String key like cusip-id or a complex key containing both the cusip and id.
> Either way, the PartitionResolver will need to be able to return the cusip
> for the routing object.
>
> Invoke the function like:
>
> Execution execution =
> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute("TradeQueryFunction");
> Object result = collector.getResult();
>
> In the TradeQueryFunction, execute the query like:
>
> RegionFunctionContext rfc = (RegionFunctionContext) context;
> String cusip = (String) rfc.getFilter().iterator().next();
> SelectResults results = (SelectResults) this.query.execute(rfc, new
> String[] {cusip});
>
> Where the query is:
>
> select * from /Trade where cusip = $1
>
> This will route the function request to the member whose primary bucket
> contains the cusip filter. Then it will execute the query on the
> RegionFunctionContext which will just be the data for that bucket. Note:
> the PartitionResolver will also need to be able to return the cusip for
> that filter (which is just the input string itself).
>
> Here is a some more general info on functions.
>
> If you're executing a function onRegion with a replicated region, then the
> function is executed on any member defining that region. Since the region
> is replicated, every server has the same data.
>
> If you're executing a function onRegion with a partitioned region, then
> where the function is invoked depends on the result of optimizeForWrite. If
> optimizeForWrite returns true, the function is invoked on all the members
> containing primary buckets for that region. If optimizeForWrite returns
> false, the function is invoked on as few members as it can that encompass
> all the buckets (so it mixes primary and secondary buckets). For example if
> you have 2 members, and the primaries are split between them, then
> optimizeForWrite returning true means that the function will be invoked on
> both members. Returning false will cause the function to be invoked on only
> one member since each member has all the buckets. I almost always have
> optimizeForWrite return true.
>
> The onServer/onServers API is used for data-unaware calls (meaning no
> specific region involved). In the past, I've used it mainly for admin-type
> behavior like:
>
> - start/stop gateway senders
> - create regions
> - rebalance
> - assign buckets
>
> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
> necessarily need functions to do it anymore.
>
> One of my favorite onServer use cases is the command pattern using a
> Request/Response API like:
>
> - define a Request (like RebalanceCache)-
> - pass it as an argument to a CommandFunction from the client to a server
> using onServer
> - execute it on the server
> - return a Response
>
> One use case for invoking a function from another function is member
> notification. This can be done with a CacheListener on a replicated region
> too, but the basic idea is:
>
> - invoke a function
> - in the function, invoke another function on all the members notifying
> them something is about to happen
> - do the thing
> - invoke another function on all the members notifying them something has
> happened
>
> You need to be careful when invoking one function from another. Depending
> on what you're doing in the second function, you could get yourself into a
> distributed deadlock situation.
>
> I'm not sure this answers all the issues you were seeing, but hopefully it
> helps.
>
> Thanks,
> Barry Oglesby
>
>
> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mr...@pivotal.io> wrote:
>
>> Hi all,
>>
>> I'm involved in a sizable GemFire Project right now that is requiring me
>> to execute Functions in a number of ways, and I wanted to poll the
>> community for some best practices.  So initially I would execute all
>> functions like this.
>>
>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>     .withArgs(arguments).execute("my-awesome-function");
>>
>> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>>
>> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>>
>> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>>
>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>
>> myAweSomeFunction.execute(functionContext);
>>
>> This worked for a time but complexity started rising and more problems came up.
>>
>> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>>
>> *Matthew Ross | Data Engineer | Pivotal*
>> *625 Avenue of the Americas NY, NY 10011*
>> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>>
>>
>


-- 

*Mark Secrist | Sr Manager, **Global Education Delivery*

msecrist@pivotal.io

970.214.4567 Mobile

  *pivotal.io <http://www.pivotal.io/>*

Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn
<http://www.linkedin.com/company/pivotalsoftware> | Facebook
<http://www.facebook.com/pivotalsoftware> | YouTube
<http://www.youtube.com/gopivotal> | Google+
<https://plus.google.com/105320112436428794490>

Re: Best Practices for Calling Server Side Functions

Posted by Barry Oglesby <bo...@pivotal.io>.

Oh, I see. PartitionRegionHelper.isPartitionedRegion.getLocalDataForContext is
not supported for replicated regions.

You can use PartitionRegionHelper.isPartitionedRegion to test whether or
not the region is partitioned before calling getLocalDataForContext. If the
region is not partitioned, RegionFunctionContext.getDataSet is all you'll
need.

Thanks,
Barry Oglesby


On Mon, Apr 18, 2016 at 5:10 PM, Mark Secrist <ms...@pivotal.io> wrote:

> com.gemstone.gemfire.cache.execute.FunctionException:
> com.gemstone.gemfire.cache.client.ServerOperationException: remote server
> on den-display-1(ClientWorker:42513:loner):63530:f65bd92b:ClientWorker:
> While performing a remote executeRegionFunction
> at
> com.gemstone.gemfire.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:223)
> at
> com.gemstone.gemfire.internal.cache.execute.ServerRegionFunctionExecutor.executeFunction(ServerRegionFunctionExecutor.java:165)
> at
> com.gemstone.gemfire.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:363)
> at
> com.gopivotal.bookshop.buslogic.SummingTests.shouldComputeTotalForAllOrders(SummingTests.java:44)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
> at
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
> Caused by: com.gemstone.gemfire.cache.client.ServerOperationException:
> remote server on
> den-display-1(ClientWorker:42513:loner):63530:f65bd92b:ClientWorker: While
> performing a remote executeRegionFunction
> at
> com.gemstone.gemfire.cache.client.internal.ExecuteRegionFunctionOp$ExecuteRegionFunctionOpImpl.processResponse(ExecuteRegionFunctionOp.java:591)
> at
> com.gemstone.gemfire.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:215)
> at
> com.gemstone.gemfire.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:153)
> at
> com.gemstone.gemfire.cache.client.internal.AbstractOp.attempt(AbstractOp.java:369)
> at
> com.gemstone.gemfire.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:267)
> at
> com.gemstone.gemfire.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:319)
> at
> com.gemstone.gemfire.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:930)
> at
> com.gemstone.gemfire.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:158)
> at
> com.gemstone.gemfire.cache.client.internal.PoolImpl.execute(PoolImpl.java:716)
> at
> com.gemstone.gemfire.cache.client.internal.ExecuteRegionFunctionOp.execute(ExecuteRegionFunctionOp.java:159)
> at
> com.gemstone.gemfire.cache.client.internal.ServerRegionProxy.executeFunction(ServerRegionProxy.java:801)
> at
> com.gemstone.gemfire.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:212)
> ... 28 more
> Caused by: com.gemstone.gemfire.cache.execute.FunctionException:
> java.lang.IllegalArgumentException: Region /BookOrder is not a Partitioned
> Region
> at
> com.gemstone.gemfire.cache.client.internal.ExecuteRegionFunctionOp$ExecuteRegionFunctionOpImpl.processResponse(ExecuteRegionFunctionOp.java:580)
> ... 39 more
> Caused by: java.lang.IllegalArgumentException: Region /BookOrder is not a
> Partitioned Region
> at
> com.gemstone.gemfire.cache.partition.PartitionRegionHelper.isPartitionedCheck(PartitionRegionHelper.java:137)
> at
> com.gemstone.gemfire.cache.partition.PartitionRegionHelper.getLocalDataForContext(PartitionRegionHelper.java:365)
> at
> com.gopivotal.bookshop.buslogic.GenericSumFunction.execute(GenericSumFunction.java:22)
> at
> com.gemstone.gemfire.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:359)
> at
> com.gemstone.gemfire.internal.cache.execute.AbstractExecution$2.run(AbstractExecution.java:325)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at
> com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:692)
> at
> com.gemstone.gemfire.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1149)
> at java.lang.Thread.run(Thread.java:745)
>
>
> On Mon, Apr 18, 2016 at 12:27 PM, Barry Oglesby <bo...@pivotal.io>
> wrote:
>
>> Wes,
>>
>> Query does support executing on a RegionFunctionContext with bind
>> parameters:
>>
>> public Object execute(RegionFunctionContext context, Object[] params)
>>   throws FunctionDomainException, TypeMismatchException, NameResolutionException,
>>          QueryInvocationTargetException;
>>
>>
>> Thanks,
>> Barry Oglesby
>>
>>
>> On Mon, Apr 18, 2016 at 11:05 AM, Real Wes Williams <
>> TheRealWes@outlook.com> wrote:
>>
>>> Thanks Barry. This is a needed thread.
>>>
>>> Before leaving RegionFunctionContext, I execute queries often within a
>>> function.onRegion. Is there a good reason why we don’t support passing the
>>> RFC into a query when using bind parameters? If not, I’d like to add that
>>> to a Geode enhancement to eke out even more performance.  In performance
>>> tests using a region with only 2,000 entries with a small number of nodes,
>>> I did not see a performance difference between:
>>> A) executing a query using RFC, and
>>> B) executing a query with bind parameters not using RFC
>>>
>>> although supporting the RFC with B should theoretically be even faster.
>>>
>>> Regards,
>>> Wes Williams
>>>
>>>
>>> http://gemfire81.docs.pivotal.io/docs-gemfire/developing/query_additional/using_query_bind_parameters.html#concept_173E775FE46B47DF9D7D1E40680D34DF
>>>
>>>
>>> On Apr 15, 2016, at 9:07 PM, Barry Oglesby <bo...@pivotal.io> wrote:
>>>
>>> Wes,
>>>
>>> Because your regions are colocated, your example actually works. I'm not
>>> sure why you'd do this, and I'm not sure I'd recommend it.
>>>
>>> Under the covers, the query determines it is on the Trade region. Then
>>> it gets the buckets (set of integers) from the RegionFunctionContext. Then,
>>> the query, parameters and buckets are passed to the region to be executed.
>>>
>>> So, the only thing the RFC is used for is to get the appropriate
>>> buckets, and they should be the same in either case.
>>>
>>> You might see some issues with this idea when buckets are moving around
>>> during a rebalance. You'd have to test in that scenario to verify.
>>>
>>>
>>> Thanks,
>>> Barry Oglesby
>>>
>>>
>>> On Fri, Apr 15, 2016 at 5:14 PM, Real Wes Williams <
>>> TheRealWes@outlook.com> wrote:
>>>
>>>> Barry,
>>>>
>>>> Would passing the RegionFunctionContext to the query exception apply
>>>> whether the original function was executed on the Orders region vs the
>>>> Trades region in your example?  If they are colocated I would intuitively
>>>> think that it _may_ not matter, but if so, the side effects would probably
>>>> be subtle.
>>>>
>>>> To be specific by way of modifying your example, are the following
>>>> equivalent given that Orders and Trades are colocated?
>>>>
>>>> Example 1 - Executing on the Orders region:
>>>>  **********************************************
>>>> Execution execution = FunctionService.onRegion(*orderRegion*
>>>> ).withFilter(Collections.singleton(cusip));
>>>> ResultCollector collector = execution.execute(“TradeQueryFunction");
>>>>
>>>> In the function….
>>>> Query query = queryService.newQuery(select * from /Trade where cusip =
>>>> ‘123');
>>>> SelectResults results = (SelectResults) this.query.execute(*rfc*, new
>>>> String[] {cusip});
>>>>
>>>> Example 1 - Executing on the Trades region:
>>>>  **********************************************
>>>> Execution execution = FunctionService.onRegion(*tradeRegion*
>>>> ).withFilter(Collections.singleton(cusip));
>>>> ResultCollector collector = execution.execute(“TradeQueryFunction");
>>>>
>>>> In the function….
>>>> Query query = queryService.newQuery(select * from /Trade where cusip =
>>>> ‘123');
>>>> SelectResults results = (SelectResults) this.query.execute(*rfc*, new
>>>> String[] {cusip});
>>>>
>>>> And then in the function:
>>>>
>>>> On Apr 15, 2016, at 7:53 PM, Barry Oglesby <bo...@pivotal.io> wrote:
>>>>
>>>> Executing queries in functions can be tricky.
>>>>
>>>> For executing queries in a function, do something like:
>>>>
>>>> - invoke the function with onRegion
>>>> - have the function return true from optimizeForWrite so that it is
>>>> executed only on primary buckets
>>>> - use the Query execute API with a RegionFunctionContext in the
>>>> function. Otherwise, you could easily end up executing the same query on
>>>> more than one member.
>>>>
>>>> If you set a filter, the function (and query) will execute on only the
>>>> member containing the primary or primaries for that filter.
>>>>
>>>> Here is an example with trades.
>>>>
>>>> If you route all trades on a specific cusip to the same bucket using a
>>>> PartitionResolver, then querying for all trades for a specific cusip can be
>>>> done efficiently using a Function. The trades could be stored with a simple
>>>> String key like cusip-id or a complex key containing both the cusip and id.
>>>> Either way, the PartitionResolver will need to be able to return the cusip
>>>> for the routing object.
>>>>
>>>> Invoke the function like:
>>>>
>>>> Execution execution =
>>>> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
>>>> ResultCollector collector = execution.execute("TradeQueryFunction");
>>>> Object result = collector.getResult();
>>>>
>>>> In the TradeQueryFunction, execute the query like:
>>>>
>>>> RegionFunctionContext rfc = (RegionFunctionContext) context;
>>>> String cusip = (String) rfc.getFilter().iterator().next();
>>>> SelectResults results = (SelectResults) this.query.execute(rfc, new
>>>> String[] {cusip});
>>>>
>>>> Where the query is:
>>>>
>>>> select * from /Trade where cusip = $1
>>>>
>>>> This will route the function request to the member whose primary bucket
>>>> contains the cusip filter. Then it will execute the query on the
>>>> RegionFunctionContext which will just be the data for that bucket. Note:
>>>> the PartitionResolver will also need to be able to return the cusip for
>>>> that filter (which is just the input string itself).
>>>>
>>>> Here is a some more general info on functions.
>>>>
>>>> If you're executing a function onRegion with a replicated region, then
>>>> the function is executed on any member defining that region. Since the
>>>> region is replicated, every server has the same data.
>>>>
>>>> If you're executing a function onRegion with a partitioned region, then
>>>> where the function is invoked depends on the result of optimizeForWrite. If
>>>> optimizeForWrite returns true, the function is invoked on all the members
>>>> containing primary buckets for that region. If optimizeForWrite returns
>>>> false, the function is invoked on as few members as it can that encompass
>>>> all the buckets (so it mixes primary and secondary buckets). For example if
>>>> you have 2 members, and the primaries are split between them, then
>>>> optimizeForWrite returning true means that the function will be invoked on
>>>> both members. Returning false will cause the function to be invoked on only
>>>> one member since each member has all the buckets. I almost always have
>>>> optimizeForWrite return true.
>>>>
>>>> The onServer/onServers API is used for data-unaware calls (meaning no
>>>> specific region involved). In the past, I've used it mainly for admin-type
>>>> behavior like:
>>>>
>>>> - start/stop gateway senders
>>>> - create regions
>>>> - rebalance
>>>> - assign buckets
>>>>
>>>> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
>>>> necessarily need functions to do it anymore.
>>>>
>>>> One of my favorite onServer use cases is the command pattern using a
>>>> Request/Response API like:
>>>>
>>>> - define a Request (like RebalanceCache)-
>>>> - pass it as an argument to a CommandFunction from the client to a
>>>> server using onServer
>>>> - execute it on the server
>>>> - return a Response
>>>>
>>>> One use case for invoking a function from another function is member
>>>> notification. This can be done with a CacheListener on a replicated region
>>>> too, but the basic idea is:
>>>>
>>>> - invoke a function
>>>> - in the function, invoke another function on all the members notifying
>>>> them something is about to happen
>>>> - do the thing
>>>> - invoke another function on all the members notifying them something
>>>> has happened
>>>>
>>>> You need to be careful when invoking one function from another.
>>>> Depending on what you're doing in the second function, you could get
>>>> yourself into a distributed deadlock situation.
>>>>
>>>> I'm not sure this answers all the issues you were seeing, but hopefully
>>>> it helps.
>>>>
>>>> Thanks,
>>>> Barry Oglesby
>>>>
>>>>
>>>> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mr...@pivotal.io> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm involved in a sizable GemFire Project right now that is requiring
>>>>> me to execute Functions in a number of ways, and I wanted to poll the
>>>>> community for some best practices.  So initially I would execute all
>>>>> functions like this.
>>>>>
>>>>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>>>>     .withArgs(arguments).execute("my-awesome-function");
>>>>>
>>>>> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>>>>>
>>>>> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>>>>>
>>>>> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>>>>>
>>>>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>>>>
>>>>> myAweSomeFunction.execute(functionContext);
>>>>>
>>>>> This worked for a time but complexity started rising and more problems came up.
>>>>>
>>>>> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>>>>>
>>>>> *Matthew Ross | Data Engineer | Pivotal*
>>>>> *625 Avenue of the Americas NY, NY 10011*
>>>>> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
> --
>
> *Mark Secrist | Sr Manager, **Global Education Delivery*
>
> msecrist@pivotal.io
>
> 970.214.4567 Mobile
>
>   *pivotal.io <http://www.pivotal.io/>*
>
> Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn
> <http://www.linkedin.com/company/pivotalsoftware> | Facebook
> <http://www.facebook.com/pivotalsoftware> | YouTube
> <http://www.youtube.com/gopivotal> | Google+
> <https://plus.google.com/105320112436428794490>
>

Re: Best Practices for Calling Server Side Functions

Posted by Mark Secrist <ms...@pivotal.io>.

com.gemstone.gemfire.cache.execute.FunctionException:
com.gemstone.gemfire.cache.client.ServerOperationException: remote server
on den-display-1(ClientWorker:42513:loner):63530:f65bd92b:ClientWorker:
While performing a remote executeRegionFunction
at
com.gemstone.gemfire.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:223)
at
com.gemstone.gemfire.internal.cache.execute.ServerRegionFunctionExecutor.executeFunction(ServerRegionFunctionExecutor.java:165)
at
com.gemstone.gemfire.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:363)
at
com.gopivotal.bookshop.buslogic.SummingTests.shouldComputeTotalForAllOrders(SummingTests.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: com.gemstone.gemfire.cache.client.ServerOperationException:
remote server on
den-display-1(ClientWorker:42513:loner):63530:f65bd92b:ClientWorker: While
performing a remote executeRegionFunction
at
com.gemstone.gemfire.cache.client.internal.ExecuteRegionFunctionOp$ExecuteRegionFunctionOpImpl.processResponse(ExecuteRegionFunctionOp.java:591)
at
com.gemstone.gemfire.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:215)
at
com.gemstone.gemfire.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:153)
at
com.gemstone.gemfire.cache.client.internal.AbstractOp.attempt(AbstractOp.java:369)
at
com.gemstone.gemfire.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:267)
at
com.gemstone.gemfire.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:319)
at
com.gemstone.gemfire.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:930)
at
com.gemstone.gemfire.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:158)
at
com.gemstone.gemfire.cache.client.internal.PoolImpl.execute(PoolImpl.java:716)
at
com.gemstone.gemfire.cache.client.internal.ExecuteRegionFunctionOp.execute(ExecuteRegionFunctionOp.java:159)
at
com.gemstone.gemfire.cache.client.internal.ServerRegionProxy.executeFunction(ServerRegionProxy.java:801)
at
com.gemstone.gemfire.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:212)
... 28 more
Caused by: com.gemstone.gemfire.cache.execute.FunctionException:
java.lang.IllegalArgumentException: Region /BookOrder is not a Partitioned
Region
at
com.gemstone.gemfire.cache.client.internal.ExecuteRegionFunctionOp$ExecuteRegionFunctionOpImpl.processResponse(ExecuteRegionFunctionOp.java:580)
... 39 more
Caused by: java.lang.IllegalArgumentException: Region /BookOrder is not a
Partitioned Region
at
com.gemstone.gemfire.cache.partition.PartitionRegionHelper.isPartitionedCheck(PartitionRegionHelper.java:137)
at
com.gemstone.gemfire.cache.partition.PartitionRegionHelper.getLocalDataForContext(PartitionRegionHelper.java:365)
at
com.gopivotal.bookshop.buslogic.GenericSumFunction.execute(GenericSumFunction.java:22)
at
com.gemstone.gemfire.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:359)
at
com.gemstone.gemfire.internal.cache.execute.AbstractExecution$2.run(AbstractExecution.java:325)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:692)
at
com.gemstone.gemfire.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1149)
at java.lang.Thread.run(Thread.java:745)

On Mon, Apr 18, 2016 at 12:27 PM, Barry Oglesby <bo...@pivotal.io> wrote:

> Wes,
>
> Query does support executing on a RegionFunctionContext with bind
> parameters:
>
> public Object execute(RegionFunctionContext context, Object[] params)
>   throws FunctionDomainException, TypeMismatchException, NameResolutionException,
>          QueryInvocationTargetException;
>
>
> Thanks,
> Barry Oglesby
>
>
> On Mon, Apr 18, 2016 at 11:05 AM, Real Wes Williams <
> TheRealWes@outlook.com> wrote:
>
>> Thanks Barry. This is a needed thread.
>>
>> Before leaving RegionFunctionContext, I execute queries often within a
>> function.onRegion. Is there a good reason why we don’t support passing the
>> RFC into a query when using bind parameters? If not, I’d like to add that
>> to a Geode enhancement to eke out even more performance.  In performance
>> tests using a region with only 2,000 entries with a small number of nodes,
>> I did not see a performance difference between:
>> A) executing a query using RFC, and
>> B) executing a query with bind parameters not using RFC
>>
>> although supporting the RFC with B should theoretically be even faster.
>>
>> Regards,
>> Wes Williams
>>
>>
>> http://gemfire81.docs.pivotal.io/docs-gemfire/developing/query_additional/using_query_bind_parameters.html#concept_173E775FE46B47DF9D7D1E40680D34DF
>>
>>
>> On Apr 15, 2016, at 9:07 PM, Barry Oglesby <bo...@pivotal.io> wrote:
>>
>> Wes,
>>
>> Because your regions are colocated, your example actually works. I'm not
>> sure why you'd do this, and I'm not sure I'd recommend it.
>>
>> Under the covers, the query determines it is on the Trade region. Then it
>> gets the buckets (set of integers) from the RegionFunctionContext. Then,
>> the query, parameters and buckets are passed to the region to be executed.
>>
>> So, the only thing the RFC is used for is to get the appropriate buckets,
>> and they should be the same in either case.
>>
>> You might see some issues with this idea when buckets are moving around
>> during a rebalance. You'd have to test in that scenario to verify.
>>
>>
>> Thanks,
>> Barry Oglesby
>>
>>
>> On Fri, Apr 15, 2016 at 5:14 PM, Real Wes Williams <
>> TheRealWes@outlook.com> wrote:
>>
>>> Barry,
>>>
>>> Would passing the RegionFunctionContext to the query exception apply
>>> whether the original function was executed on the Orders region vs the
>>> Trades region in your example?  If they are colocated I would intuitively
>>> think that it _may_ not matter, but if so, the side effects would probably
>>> be subtle.
>>>
>>> To be specific by way of modifying your example, are the following
>>> equivalent given that Orders and Trades are colocated?
>>>
>>> Example 1 - Executing on the Orders region:
>>>  **********************************************
>>> Execution execution = FunctionService.onRegion(*orderRegion*
>>> ).withFilter(Collections.singleton(cusip));
>>> ResultCollector collector = execution.execute(“TradeQueryFunction");
>>>
>>> In the function….
>>> Query query = queryService.newQuery(select * from /Trade where cusip =
>>> ‘123');
>>> SelectResults results = (SelectResults) this.query.execute(*rfc*, new
>>> String[] {cusip});
>>>
>>> Example 1 - Executing on the Trades region:
>>>  **********************************************
>>> Execution execution = FunctionService.onRegion(*tradeRegion*
>>> ).withFilter(Collections.singleton(cusip));
>>> ResultCollector collector = execution.execute(“TradeQueryFunction");
>>>
>>> In the function….
>>> Query query = queryService.newQuery(select * from /Trade where cusip =
>>> ‘123');
>>> SelectResults results = (SelectResults) this.query.execute(*rfc*, new
>>> String[] {cusip});
>>>
>>> And then in the function:
>>>
>>> On Apr 15, 2016, at 7:53 PM, Barry Oglesby <bo...@pivotal.io> wrote:
>>>
>>> Executing queries in functions can be tricky.
>>>
>>> For executing queries in a function, do something like:
>>>
>>> - invoke the function with onRegion
>>> - have the function return true from optimizeForWrite so that it is
>>> executed only on primary buckets
>>> - use the Query execute API with a RegionFunctionContext in the
>>> function. Otherwise, you could easily end up executing the same query on
>>> more than one member.
>>>
>>> If you set a filter, the function (and query) will execute on only the
>>> member containing the primary or primaries for that filter.
>>>
>>> Here is an example with trades.
>>>
>>> If you route all trades on a specific cusip to the same bucket using a
>>> PartitionResolver, then querying for all trades for a specific cusip can be
>>> done efficiently using a Function. The trades could be stored with a simple
>>> String key like cusip-id or a complex key containing both the cusip and id.
>>> Either way, the PartitionResolver will need to be able to return the cusip
>>> for the routing object.
>>>
>>> Invoke the function like:
>>>
>>> Execution execution =
>>> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
>>> ResultCollector collector = execution.execute("TradeQueryFunction");
>>> Object result = collector.getResult();
>>>
>>> In the TradeQueryFunction, execute the query like:
>>>
>>> RegionFunctionContext rfc = (RegionFunctionContext) context;
>>> String cusip = (String) rfc.getFilter().iterator().next();
>>> SelectResults results = (SelectResults) this.query.execute(rfc, new
>>> String[] {cusip});
>>>
>>> Where the query is:
>>>
>>> select * from /Trade where cusip = $1
>>>
>>> This will route the function request to the member whose primary bucket
>>> contains the cusip filter. Then it will execute the query on the
>>> RegionFunctionContext which will just be the data for that bucket. Note:
>>> the PartitionResolver will also need to be able to return the cusip for
>>> that filter (which is just the input string itself).
>>>
>>> Here is a some more general info on functions.
>>>
>>> If you're executing a function onRegion with a replicated region, then
>>> the function is executed on any member defining that region. Since the
>>> region is replicated, every server has the same data.
>>>
>>> If you're executing a function onRegion with a partitioned region, then
>>> where the function is invoked depends on the result of optimizeForWrite. If
>>> optimizeForWrite returns true, the function is invoked on all the members
>>> containing primary buckets for that region. If optimizeForWrite returns
>>> false, the function is invoked on as few members as it can that encompass
>>> all the buckets (so it mixes primary and secondary buckets). For example if
>>> you have 2 members, and the primaries are split between them, then
>>> optimizeForWrite returning true means that the function will be invoked on
>>> both members. Returning false will cause the function to be invoked on only
>>> one member since each member has all the buckets. I almost always have
>>> optimizeForWrite return true.
>>>
>>> The onServer/onServers API is used for data-unaware calls (meaning no
>>> specific region involved). In the past, I've used it mainly for admin-type
>>> behavior like:
>>>
>>> - start/stop gateway senders
>>> - create regions
>>> - rebalance
>>> - assign buckets
>>>
>>> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
>>> necessarily need functions to do it anymore.
>>>
>>> One of my favorite onServer use cases is the command pattern using a
>>> Request/Response API like:
>>>
>>> - define a Request (like RebalanceCache)-
>>> - pass it as an argument to a CommandFunction from the client to a
>>> server using onServer
>>> - execute it on the server
>>> - return a Response
>>>
>>> One use case for invoking a function from another function is member
>>> notification. This can be done with a CacheListener on a replicated region
>>> too, but the basic idea is:
>>>
>>> - invoke a function
>>> - in the function, invoke another function on all the members notifying
>>> them something is about to happen
>>> - do the thing
>>> - invoke another function on all the members notifying them something
>>> has happened
>>>
>>> You need to be careful when invoking one function from another.
>>> Depending on what you're doing in the second function, you could get
>>> yourself into a distributed deadlock situation.
>>>
>>> I'm not sure this answers all the issues you were seeing, but hopefully
>>> it helps.
>>>
>>> Thanks,
>>> Barry Oglesby
>>>
>>>
>>> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mr...@pivotal.io> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm involved in a sizable GemFire Project right now that is requiring
>>>> me to execute Functions in a number of ways, and I wanted to poll the
>>>> community for some best practices.  So initially I would execute all
>>>> functions like this.
>>>>
>>>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>>>     .withArgs(arguments).execute("my-awesome-function");
>>>>
>>>> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>>>>
>>>> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>>>>
>>>> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>>>>
>>>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>>>
>>>> myAweSomeFunction.execute(functionContext);
>>>>
>>>> This worked for a time but complexity started rising and more problems came up.
>>>>
>>>> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>>>>
>>>> *Matthew Ross | Data Engineer | Pivotal*
>>>> *625 Avenue of the Americas NY, NY 10011*
>>>> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>>>>
>>>>
>>>
>>>
>>
>>
>

-- 

*Mark Secrist | Sr Manager, **Global Education Delivery*

msecrist@pivotal.io

970.214.4567 Mobile

  *pivotal.io <http://www.pivotal.io/>*

Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn
<http://www.linkedin.com/company/pivotalsoftware> | Facebook
<http://www.facebook.com/pivotalsoftware> | YouTube
<http://www.youtube.com/gopivotal> | Google+
<https://plus.google.com/105320112436428794490>

Re: Best Practices for Calling Server Side Functions

Posted by Barry Oglesby <bo...@pivotal.io>.

Wes,

Query does support executing on a RegionFunctionContext with bind
parameters:

public Object execute(RegionFunctionContext context, Object[] params)
  throws FunctionDomainException, TypeMismatchException,
NameResolutionException,
         QueryInvocationTargetException;


Thanks,
Barry Oglesby


On Mon, Apr 18, 2016 at 11:05 AM, Real Wes Williams <Th...@outlook.com>
wrote:

> Thanks Barry. This is a needed thread.
>
> Before leaving RegionFunctionContext, I execute queries often within a
> function.onRegion. Is there a good reason why we don’t support passing the
> RFC into a query when using bind parameters? If not, I’d like to add that
> to a Geode enhancement to eke out even more performance.  In performance
> tests using a region with only 2,000 entries with a small number of nodes,
> I did not see a performance difference between:
> A) executing a query using RFC, and
> B) executing a query with bind parameters not using RFC
>
> although supporting the RFC with B should theoretically be even faster.
>
> Regards,
> Wes Williams
>
>
> http://gemfire81.docs.pivotal.io/docs-gemfire/developing/query_additional/using_query_bind_parameters.html#concept_173E775FE46B47DF9D7D1E40680D34DF
>
>
> On Apr 15, 2016, at 9:07 PM, Barry Oglesby <bo...@pivotal.io> wrote:
>
> Wes,
>
> Because your regions are colocated, your example actually works. I'm not
> sure why you'd do this, and I'm not sure I'd recommend it.
>
> Under the covers, the query determines it is on the Trade region. Then it
> gets the buckets (set of integers) from the RegionFunctionContext. Then,
> the query, parameters and buckets are passed to the region to be executed.
>
> So, the only thing the RFC is used for is to get the appropriate buckets,
> and they should be the same in either case.
>
> You might see some issues with this idea when buckets are moving around
> during a rebalance. You'd have to test in that scenario to verify.
>
>
> Thanks,
> Barry Oglesby
>
>
> On Fri, Apr 15, 2016 at 5:14 PM, Real Wes Williams <TheRealWes@outlook.com
> > wrote:
>
>> Barry,
>>
>> Would passing the RegionFunctionContext to the query exception apply
>> whether the original function was executed on the Orders region vs the
>> Trades region in your example?  If they are colocated I would intuitively
>> think that it _may_ not matter, but if so, the side effects would probably
>> be subtle.
>>
>> To be specific by way of modifying your example, are the following
>> equivalent given that Orders and Trades are colocated?
>>
>> Example 1 - Executing on the Orders region:
>>  **********************************************
>> Execution execution = FunctionService.onRegion(*orderRegion*
>> ).withFilter(Collections.singleton(cusip));
>> ResultCollector collector = execution.execute(“TradeQueryFunction");
>>
>> In the function….
>> Query query = queryService.newQuery(select * from /Trade where cusip =
>> ‘123');
>> SelectResults results = (SelectResults) this.query.execute(*rfc*, new
>> String[] {cusip});
>>
>> Example 1 - Executing on the Trades region:
>>  **********************************************
>> Execution execution = FunctionService.onRegion(*tradeRegion*
>> ).withFilter(Collections.singleton(cusip));
>> ResultCollector collector = execution.execute(“TradeQueryFunction");
>>
>> In the function….
>> Query query = queryService.newQuery(select * from /Trade where cusip =
>> ‘123');
>> SelectResults results = (SelectResults) this.query.execute(*rfc*, new
>> String[] {cusip});
>>
>> And then in the function:
>>
>> On Apr 15, 2016, at 7:53 PM, Barry Oglesby <bo...@pivotal.io> wrote:
>>
>> Executing queries in functions can be tricky.
>>
>> For executing queries in a function, do something like:
>>
>> - invoke the function with onRegion
>> - have the function return true from optimizeForWrite so that it is
>> executed only on primary buckets
>> - use the Query execute API with a RegionFunctionContext in the function.
>> Otherwise, you could easily end up executing the same query on more than
>> one member.
>>
>> If you set a filter, the function (and query) will execute on only the
>> member containing the primary or primaries for that filter.
>>
>> Here is an example with trades.
>>
>> If you route all trades on a specific cusip to the same bucket using a
>> PartitionResolver, then querying for all trades for a specific cusip can be
>> done efficiently using a Function. The trades could be stored with a simple
>> String key like cusip-id or a complex key containing both the cusip and id.
>> Either way, the PartitionResolver will need to be able to return the cusip
>> for the routing object.
>>
>> Invoke the function like:
>>
>> Execution execution =
>> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
>> ResultCollector collector = execution.execute("TradeQueryFunction");
>> Object result = collector.getResult();
>>
>> In the TradeQueryFunction, execute the query like:
>>
>> RegionFunctionContext rfc = (RegionFunctionContext) context;
>> String cusip = (String) rfc.getFilter().iterator().next();
>> SelectResults results = (SelectResults) this.query.execute(rfc, new
>> String[] {cusip});
>>
>> Where the query is:
>>
>> select * from /Trade where cusip = $1
>>
>> This will route the function request to the member whose primary bucket
>> contains the cusip filter. Then it will execute the query on the
>> RegionFunctionContext which will just be the data for that bucket. Note:
>> the PartitionResolver will also need to be able to return the cusip for
>> that filter (which is just the input string itself).
>>
>> Here is a some more general info on functions.
>>
>> If you're executing a function onRegion with a replicated region, then
>> the function is executed on any member defining that region. Since the
>> region is replicated, every server has the same data.
>>
>> If you're executing a function onRegion with a partitioned region, then
>> where the function is invoked depends on the result of optimizeForWrite. If
>> optimizeForWrite returns true, the function is invoked on all the members
>> containing primary buckets for that region. If optimizeForWrite returns
>> false, the function is invoked on as few members as it can that encompass
>> all the buckets (so it mixes primary and secondary buckets). For example if
>> you have 2 members, and the primaries are split between them, then
>> optimizeForWrite returning true means that the function will be invoked on
>> both members. Returning false will cause the function to be invoked on only
>> one member since each member has all the buckets. I almost always have
>> optimizeForWrite return true.
>>
>> The onServer/onServers API is used for data-unaware calls (meaning no
>> specific region involved). In the past, I've used it mainly for admin-type
>> behavior like:
>>
>> - start/stop gateway senders
>> - create regions
>> - rebalance
>> - assign buckets
>>
>> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
>> necessarily need functions to do it anymore.
>>
>> One of my favorite onServer use cases is the command pattern using a
>> Request/Response API like:
>>
>> - define a Request (like RebalanceCache)-
>> - pass it as an argument to a CommandFunction from the client to a server
>> using onServer
>> - execute it on the server
>> - return a Response
>>
>> One use case for invoking a function from another function is member
>> notification. This can be done with a CacheListener on a replicated region
>> too, but the basic idea is:
>>
>> - invoke a function
>> - in the function, invoke another function on all the members notifying
>> them something is about to happen
>> - do the thing
>> - invoke another function on all the members notifying them something has
>> happened
>>
>> You need to be careful when invoking one function from another. Depending
>> on what you're doing in the second function, you could get yourself into a
>> distributed deadlock situation.
>>
>> I'm not sure this answers all the issues you were seeing, but hopefully
>> it helps.
>>
>> Thanks,
>> Barry Oglesby
>>
>>
>> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mr...@pivotal.io> wrote:
>>
>>> Hi all,
>>>
>>> I'm involved in a sizable GemFire Project right now that is requiring me
>>> to execute Functions in a number of ways, and I wanted to poll the
>>> community for some best practices.  So initially I would execute all
>>> functions like this.
>>>
>>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>>     .withArgs(arguments).execute("my-awesome-function");
>>>
>>> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>>>
>>> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>>>
>>> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>>>
>>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>>
>>> myAweSomeFunction.execute(functionContext);
>>>
>>> This worked for a time but complexity started rising and more problems came up.
>>>
>>> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>>>
>>> *Matthew Ross | Data Engineer | Pivotal*
>>> *625 Avenue of the Americas NY, NY 10011*
>>> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>>>
>>>
>>
>>
>
>

Re: Best Practices for Calling Server Side Functions

Posted by Real Wes Williams <Th...@outlook.com>.

Thanks Barry. This is a needed thread.

Before leaving RegionFunctionContext, I execute queries often within a function.onRegion. Is there a good reason why we don’t support passing the RFC into a query when using bind parameters? If not, I’d like to add that to a Geode enhancement to eke out even more performance.  In performance tests using a region with only 2,000 entries with a small number of nodes, I did not see a performance difference between:
A) executing a query using RFC, and
B) executing a query with bind parameters not using RFC

although supporting the RFC with B should theoretically be even faster.

Regards,
Wes Williams

http://gemfire81.docs.pivotal.io/docs-gemfire/developing/query_additional/using_query_bind_parameters.html#concept_173E775FE46B47DF9D7D1E40680D34DF <http://gemfire81.docs.pivotal.io/docs-gemfire/developing/query_additional/using_query_bind_parameters.html#concept_173E775FE46B47DF9D7D1E40680D34DF>


> On Apr 15, 2016, at 9:07 PM, Barry Oglesby <bo...@pivotal.io> wrote:
> 
> Wes,
> 
> Because your regions are colocated, your example actually works. I'm not sure why you'd do this, and I'm not sure I'd recommend it.
> 
> Under the covers, the query determines it is on the Trade region. Then it gets the buckets (set of integers) from the RegionFunctionContext. Then, the query, parameters and buckets are passed to the region to be executed.
> 
> So, the only thing the RFC is used for is to get the appropriate buckets, and they should be the same in either case.
> 
> You might see some issues with this idea when buckets are moving around during a rebalance. You'd have to test in that scenario to verify.
> 
> 
> Thanks,
> Barry Oglesby
> 
> 
> On Fri, Apr 15, 2016 at 5:14 PM, Real Wes Williams <TheRealWes@outlook.com <ma...@outlook.com>> wrote:
> Barry,
> 
> Would passing the RegionFunctionContext to the query exception apply whether the original function was executed on the Orders region vs the Trades region in your example?  If they are colocated I would intuitively think that it _may_ not matter, but if so, the side effects would probably be subtle.
> 
> To be specific by way of modifying your example, are the following equivalent given that Orders and Trades are colocated?
> 
> Example 1 - Executing on the Orders region:    **********************************************
> Execution execution = FunctionService.onRegion(orderRegion).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute(“TradeQueryFunction");
> 
> In the function….
> Query query = queryService.newQuery(select * from /Trade where cusip = ‘123');
> SelectResults results = (SelectResults) this.query.execute(rfc, new String[] {cusip});
> 
> Example 1 - Executing on the Trades region:    **********************************************
> Execution execution = FunctionService.onRegion(tradeRegion).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute(“TradeQueryFunction");
> 
> In the function….
> Query query = queryService.newQuery(select * from /Trade where cusip = ‘123');
> SelectResults results = (SelectResults) this.query.execute(rfc, new String[] {cusip});
> 
> And then in the function:
> 
>> On Apr 15, 2016, at 7:53 PM, Barry Oglesby <boglesby@pivotal.io <ma...@pivotal.io>> wrote:
>> 
>> Executing queries in functions can be tricky.
>> 
>> For executing queries in a function, do something like:
>> 
>> - invoke the function with onRegion
>> - have the function return true from optimizeForWrite so that it is executed only on primary buckets
>> - use the Query execute API with a RegionFunctionContext in the function. Otherwise, you could easily end up executing the same query on more than one member.
>> 
>> If you set a filter, the function (and query) will execute on only the member containing the primary or primaries for that filter.
>> 
>> Here is an example with trades.
>> 
>> If you route all trades on a specific cusip to the same bucket using a PartitionResolver, then querying for all trades for a specific cusip can be done efficiently using a Function. The trades could be stored with a simple String key like cusip-id or a complex key containing both the cusip and id. Either way, the PartitionResolver will need to be able to return the cusip for the routing object.
>> 
>> Invoke the function like:
>> 
>> Execution execution = FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
>> ResultCollector collector = execution.execute("TradeQueryFunction");
>> Object result = collector.getResult();
>> 
>> In the TradeQueryFunction, execute the query like:
>> 
>> RegionFunctionContext rfc = (RegionFunctionContext) context;
>> String cusip = (String) rfc.getFilter().iterator().next();
>> SelectResults results = (SelectResults) this.query.execute(rfc, new String[] {cusip});
>> 
>> Where the query is:
>> 
>> select * from /Trade where cusip = $1
>> 
>> This will route the function request to the member whose primary bucket contains the cusip filter. Then it will execute the query on the RegionFunctionContext which will just be the data for that bucket. Note: the PartitionResolver will also need to be able to return the cusip for that filter (which is just the input string itself).
>> 
>> Here is a some more general info on functions.
>> 
>> If you're executing a function onRegion with a replicated region, then the function is executed on any member defining that region. Since the region is replicated, every server has the same data.
>> 
>> If you're executing a function onRegion with a partitioned region, then where the function is invoked depends on the result of optimizeForWrite. If optimizeForWrite returns true, the function is invoked on all the members containing primary buckets for that region. If optimizeForWrite returns false, the function is invoked on as few members as it can that encompass all the buckets (so it mixes primary and secondary buckets). For example if you have 2 members, and the primaries are split between them, then optimizeForWrite returning true means that the function will be invoked on both members. Returning false will cause the function to be invoked on only one member since each member has all the buckets. I almost always have optimizeForWrite return true.
>> 
>> The onServer/onServers API is used for data-unaware calls (meaning no specific region involved). In the past, I've used it mainly for admin-type behavior like:
>> 
>> - start/stop gateway senders
>> - create regions
>> - rebalance
>> - assign buckets
>> 
>> Now, gfsh does a lot of this behavior (maybe all of it), so I don't necessarily need functions to do it anymore.
>> 
>> One of my favorite onServer use cases is the command pattern using a Request/Response API like:
>> 
>> - define a Request (like RebalanceCache)-
>> - pass it as an argument to a CommandFunction from the client to a server using onServer
>> - execute it on the server 
>> - return a Response
>> 
>> One use case for invoking a function from another function is member notification. This can be done with a CacheListener on a replicated region too, but the basic idea is:
>> 
>> - invoke a function
>> - in the function, invoke another function on all the members notifying them something is about to happen
>> - do the thing
>> - invoke another function on all the members notifying them something has happened
>> 
>> You need to be careful when invoking one function from another. Depending on what you're doing in the second function, you could get yourself into a distributed deadlock situation.
>> 
>> I'm not sure this answers all the issues you were seeing, but hopefully it helps.
>> 
>> Thanks,
>> Barry Oglesby
>> 
>> 
>> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mross@pivotal.io <ma...@pivotal.io>> wrote:
>> Hi all,
>> 
>> I'm involved in a sizable GemFire Project right now that is requiring me to execute Functions in a number of ways, and I wanted to poll the community for some best practices.  So initially I would execute all functions like this. 
>> 
>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>     .withArgs(arguments).execute("my-awesome-function");
>> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.  
>> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).  
>> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>> myAweSomeFunction.execute(functionContext);
>> This worked for a time but complexity started rising and more problems came up.  
>> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>> Matthew Ross | Data Engineer | Pivotal
>> 625 Avenue of the Americas NY, NY 10011
>> 516-941-7535 <tel:516-941-7535> | mross@pivotal.io <ma...@pivotal.io> 
>> 
>> 
> 
>

Re: Best Practices for Calling Server Side Functions

Posted by Barry Oglesby <bo...@pivotal.io>.

Wes,

Because your regions are colocated, your example actually works. I'm not
sure why you'd do this, and I'm not sure I'd recommend it.

Under the covers, the query determines it is on the Trade region. Then it
gets the buckets (set of integers) from the RegionFunctionContext. Then,
the query, parameters and buckets are passed to the region to be executed.

So, the only thing the RFC is used for is to get the appropriate buckets,
and they should be the same in either case.

You might see some issues with this idea when buckets are moving around
during a rebalance. You'd have to test in that scenario to verify.


Thanks,
Barry Oglesby


On Fri, Apr 15, 2016 at 5:14 PM, Real Wes Williams <Th...@outlook.com>
wrote:

> Barry,
>
> Would passing the RegionFunctionContext to the query exception apply
> whether the original function was executed on the Orders region vs the
> Trades region in your example?  If they are colocated I would intuitively
> think that it _may_ not matter, but if so, the side effects would probably
> be subtle.
>
> To be specific by way of modifying your example, are the following
> equivalent given that Orders and Trades are colocated?
>
> Example 1 - Executing on the Orders region:
>  **********************************************
> Execution execution = FunctionService.onRegion(*orderRegion*
> ).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute(“TradeQueryFunction");
>
> In the function….
> Query query = queryService.newQuery(select * from /Trade where cusip =
> ‘123');
> SelectResults results = (SelectResults) this.query.execute(*rfc*, new
> String[] {cusip});
>
> Example 1 - Executing on the Trades region:
>  **********************************************
> Execution execution = FunctionService.onRegion(*tradeRegion*
> ).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute(“TradeQueryFunction");
>
> In the function….
> Query query = queryService.newQuery(select * from /Trade where cusip =
> ‘123');
> SelectResults results = (SelectResults) this.query.execute(*rfc*, new
> String[] {cusip});
>
> And then in the function:
>
> On Apr 15, 2016, at 7:53 PM, Barry Oglesby <bo...@pivotal.io> wrote:
>
> Executing queries in functions can be tricky.
>
> For executing queries in a function, do something like:
>
> - invoke the function with onRegion
> - have the function return true from optimizeForWrite so that it is
> executed only on primary buckets
> - use the Query execute API with a RegionFunctionContext in the function.
> Otherwise, you could easily end up executing the same query on more than
> one member.
>
> If you set a filter, the function (and query) will execute on only the
> member containing the primary or primaries for that filter.
>
> Here is an example with trades.
>
> If you route all trades on a specific cusip to the same bucket using a
> PartitionResolver, then querying for all trades for a specific cusip can be
> done efficiently using a Function. The trades could be stored with a simple
> String key like cusip-id or a complex key containing both the cusip and id.
> Either way, the PartitionResolver will need to be able to return the cusip
> for the routing object.
>
> Invoke the function like:
>
> Execution execution =
> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute("TradeQueryFunction");
> Object result = collector.getResult();
>
> In the TradeQueryFunction, execute the query like:
>
> RegionFunctionContext rfc = (RegionFunctionContext) context;
> String cusip = (String) rfc.getFilter().iterator().next();
> SelectResults results = (SelectResults) this.query.execute(rfc, new
> String[] {cusip});
>
> Where the query is:
>
> select * from /Trade where cusip = $1
>
> This will route the function request to the member whose primary bucket
> contains the cusip filter. Then it will execute the query on the
> RegionFunctionContext which will just be the data for that bucket. Note:
> the PartitionResolver will also need to be able to return the cusip for
> that filter (which is just the input string itself).
>
> Here is a some more general info on functions.
>
> If you're executing a function onRegion with a replicated region, then the
> function is executed on any member defining that region. Since the region
> is replicated, every server has the same data.
>
> If you're executing a function onRegion with a partitioned region, then
> where the function is invoked depends on the result of optimizeForWrite. If
> optimizeForWrite returns true, the function is invoked on all the members
> containing primary buckets for that region. If optimizeForWrite returns
> false, the function is invoked on as few members as it can that encompass
> all the buckets (so it mixes primary and secondary buckets). For example if
> you have 2 members, and the primaries are split between them, then
> optimizeForWrite returning true means that the function will be invoked on
> both members. Returning false will cause the function to be invoked on only
> one member since each member has all the buckets. I almost always have
> optimizeForWrite return true.
>
> The onServer/onServers API is used for data-unaware calls (meaning no
> specific region involved). In the past, I've used it mainly for admin-type
> behavior like:
>
> - start/stop gateway senders
> - create regions
> - rebalance
> - assign buckets
>
> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
> necessarily need functions to do it anymore.
>
> One of my favorite onServer use cases is the command pattern using a
> Request/Response API like:
>
> - define a Request (like RebalanceCache)-
> - pass it as an argument to a CommandFunction from the client to a server
> using onServer
> - execute it on the server
> - return a Response
>
> One use case for invoking a function from another function is member
> notification. This can be done with a CacheListener on a replicated region
> too, but the basic idea is:
>
> - invoke a function
> - in the function, invoke another function on all the members notifying
> them something is about to happen
> - do the thing
> - invoke another function on all the members notifying them something has
> happened
>
> You need to be careful when invoking one function from another. Depending
> on what you're doing in the second function, you could get yourself into a
> distributed deadlock situation.
>
> I'm not sure this answers all the issues you were seeing, but hopefully it
> helps.
>
> Thanks,
> Barry Oglesby
>
>
> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mr...@pivotal.io> wrote:
>
>> Hi all,
>>
>> I'm involved in a sizable GemFire Project right now that is requiring me
>> to execute Functions in a number of ways, and I wanted to poll the
>> community for some best practices.  So initially I would execute all
>> functions like this.
>>
>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>     .withArgs(arguments).execute("my-awesome-function");
>>
>> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>>
>> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>>
>> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>>
>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>
>> myAweSomeFunction.execute(functionContext);
>>
>> This worked for a time but complexity started rising and more problems came up.
>>
>> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>>
>> *Matthew Ross | Data Engineer | Pivotal*
>> *625 Avenue of the Americas NY, NY 10011*
>> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>>
>>
>
>

Re: Best Practices for Calling Server Side Functions

Posted by Real Wes Williams <Th...@outlook.com>.

Barry,

Would passing the RegionFunctionContext to the query exception apply whether the original function was executed on the Orders region vs the Trades region in your example?  If they are colocated I would intuitively think that it _may_ not matter, but if so, the side effects would probably be subtle.

To be specific by way of modifying your example, are the following equivalent given that Orders and Trades are colocated?

Example 1 - Executing on the Orders region:    **********************************************
Execution execution = FunctionService.onRegion(orderRegion).withFilter(Collections.singleton(cusip));
ResultCollector collector = execution.execute(“TradeQueryFunction");

In the function….
Query query = queryService.newQuery(select * from /Trade where cusip = ‘123');
SelectResults results = (SelectResults) this.query.execute(rfc, new String[] {cusip});

Example 1 - Executing on the Trades region:    **********************************************
Execution execution = FunctionService.onRegion(tradeRegion).withFilter(Collections.singleton(cusip));
ResultCollector collector = execution.execute(“TradeQueryFunction");

In the function….
Query query = queryService.newQuery(select * from /Trade where cusip = ‘123');
SelectResults results = (SelectResults) this.query.execute(rfc, new String[] {cusip});

And then in the function:

> On Apr 15, 2016, at 7:53 PM, Barry Oglesby <bo...@pivotal.io> wrote:
> 
> Executing queries in functions can be tricky.
> 
> For executing queries in a function, do something like:
> 
> - invoke the function with onRegion
> - have the function return true from optimizeForWrite so that it is executed only on primary buckets
> - use the Query execute API with a RegionFunctionContext in the function. Otherwise, you could easily end up executing the same query on more than one member.
> 
> If you set a filter, the function (and query) will execute on only the member containing the primary or primaries for that filter.
> 
> Here is an example with trades.
> 
> If you route all trades on a specific cusip to the same bucket using a PartitionResolver, then querying for all trades for a specific cusip can be done efficiently using a Function. The trades could be stored with a simple String key like cusip-id or a complex key containing both the cusip and id. Either way, the PartitionResolver will need to be able to return the cusip for the routing object.
> 
> Invoke the function like:
> 
> Execution execution = FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute("TradeQueryFunction");
> Object result = collector.getResult();
> 
> In the TradeQueryFunction, execute the query like:
> 
> RegionFunctionContext rfc = (RegionFunctionContext) context;
> String cusip = (String) rfc.getFilter().iterator().next();
> SelectResults results = (SelectResults) this.query.execute(rfc, new String[] {cusip});
> 
> Where the query is:
> 
> select * from /Trade where cusip = $1
> 
> This will route the function request to the member whose primary bucket contains the cusip filter. Then it will execute the query on the RegionFunctionContext which will just be the data for that bucket. Note: the PartitionResolver will also need to be able to return the cusip for that filter (which is just the input string itself).
> 
> Here is a some more general info on functions.
> 
> If you're executing a function onRegion with a replicated region, then the function is executed on any member defining that region. Since the region is replicated, every server has the same data.
> 
> If you're executing a function onRegion with a partitioned region, then where the function is invoked depends on the result of optimizeForWrite. If optimizeForWrite returns true, the function is invoked on all the members containing primary buckets for that region. If optimizeForWrite returns false, the function is invoked on as few members as it can that encompass all the buckets (so it mixes primary and secondary buckets). For example if you have 2 members, and the primaries are split between them, then optimizeForWrite returning true means that the function will be invoked on both members. Returning false will cause the function to be invoked on only one member since each member has all the buckets. I almost always have optimizeForWrite return true.
> 
> The onServer/onServers API is used for data-unaware calls (meaning no specific region involved). In the past, I've used it mainly for admin-type behavior like:
> 
> - start/stop gateway senders
> - create regions
> - rebalance
> - assign buckets
> 
> Now, gfsh does a lot of this behavior (maybe all of it), so I don't necessarily need functions to do it anymore.
> 
> One of my favorite onServer use cases is the command pattern using a Request/Response API like:
> 
> - define a Request (like RebalanceCache)-
> - pass it as an argument to a CommandFunction from the client to a server using onServer
> - execute it on the server 
> - return a Response
> 
> One use case for invoking a function from another function is member notification. This can be done with a CacheListener on a replicated region too, but the basic idea is:
> 
> - invoke a function
> - in the function, invoke another function on all the members notifying them something is about to happen
> - do the thing
> - invoke another function on all the members notifying them something has happened
> 
> You need to be careful when invoking one function from another. Depending on what you're doing in the second function, you could get yourself into a distributed deadlock situation.
> 
> I'm not sure this answers all the issues you were seeing, but hopefully it helps.
> 
> Thanks,
> Barry Oglesby
> 
> 
> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mross@pivotal.io <ma...@pivotal.io>> wrote:
> Hi all,
> 
> I'm involved in a sizable GemFire Project right now that is requiring me to execute Functions in a number of ways, and I wanted to poll the community for some best practices.  So initially I would execute all functions like this. 
> 
> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>     .withArgs(arguments).execute("my-awesome-function");
> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.  
> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).  
> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
> myAweSomeFunction.execute(functionContext);
> This worked for a time but complexity started rising and more problems came up.  
> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
> Matthew Ross | Data Engineer | Pivotal
> 625 Avenue of the Americas NY, NY 10011
> 516-941-7535 <tel:516-941-7535> | mross@pivotal.io <ma...@pivotal.io> 
> 
>

Re: Best Practices for Calling Server Side Functions

Posted by Gregory Chase <gc...@pivotal.io>.

On Fri, Apr 15, 2016 at 5:07 PM, Matt Ross <mr...@pivotal.io> wrote:

> Just wow, thank you all for the detailed and well thought out responses.
> I'm going to try and compile these responses into a document for future
> reference and will share back with the community.  Thanks again.

Matt, thanks for the good question! Please do post to the Geode Wiki when
you get a chance. Karma awaits your future contribution!

-- 
Greg Chase

Global Head, Big Data Communities
http://www.pivotal.io/big-data

Pivotal Software
http://www.pivotal.io/

650-215-0477
@GregChase
Blog: http://geekmarketing.biz/

Re: Best Practices for Calling Server Side Functions

Posted by Matt Ross <mr...@pivotal.io>.

Just wow, thank you all for the detailed and well thought out responses.
I'm going to try and compile these responses into a document for future
reference and will share back with the community.  Thanks again.

On Friday, April 15, 2016, Barry Oglesby <bo...@pivotal.io> wrote:

> Executing queries in functions can be tricky.
>
> For executing queries in a function, do something like:
>
> - invoke the function with onRegion
> - have the function return true from optimizeForWrite so that it is
> executed only on primary buckets
> - use the Query execute API with a RegionFunctionContext in the function.
> Otherwise, you could easily end up executing the same query on more than
> one member.
>
> If you set a filter, the function (and query) will execute on only the
> member containing the primary or primaries for that filter.
>
> Here is an example with trades.
>
> If you route all trades on a specific cusip to the same bucket using a
> PartitionResolver, then querying for all trades for a specific cusip can be
> done efficiently using a Function. The trades could be stored with a simple
> String key like cusip-id or a complex key containing both the cusip and id.
> Either way, the PartitionResolver will need to be able to return the cusip
> for the routing object.
>
> Invoke the function like:
>
> Execution execution =
> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute("TradeQueryFunction");
> Object result = collector.getResult();
>
> In the TradeQueryFunction, execute the query like:
>
> RegionFunctionContext rfc = (RegionFunctionContext) context;
> String cusip = (String) rfc.getFilter().iterator().next();
> SelectResults results = (SelectResults) this.query.execute(rfc, new
> String[] {cusip});
>
> Where the query is:
>
> select * from /Trade where cusip = $1
>
> This will route the function request to the member whose primary bucket
> contains the cusip filter. Then it will execute the query on the
> RegionFunctionContext which will just be the data for that bucket. Note:
> the PartitionResolver will also need to be able to return the cusip for
> that filter (which is just the input string itself).
>
> Here is a some more general info on functions.
>
> If you're executing a function onRegion with a replicated region, then the
> function is executed on any member defining that region. Since the region
> is replicated, every server has the same data.
>
> If you're executing a function onRegion with a partitioned region, then
> where the function is invoked depends on the result of optimizeForWrite. If
> optimizeForWrite returns true, the function is invoked on all the members
> containing primary buckets for that region. If optimizeForWrite returns
> false, the function is invoked on as few members as it can that encompass
> all the buckets (so it mixes primary and secondary buckets). For example if
> you have 2 members, and the primaries are split between them, then
> optimizeForWrite returning true means that the function will be invoked on
> both members. Returning false will cause the function to be invoked on only
> one member since each member has all the buckets. I almost always have
> optimizeForWrite return true.
>
> The onServer/onServers API is used for data-unaware calls (meaning no
> specific region involved). In the past, I've used it mainly for admin-type
> behavior like:
>
> - start/stop gateway senders
> - create regions
> - rebalance
> - assign buckets
>
> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
> necessarily need functions to do it anymore.
>
> One of my favorite onServer use cases is the command pattern using a
> Request/Response API like:
>
> - define a Request (like RebalanceCache)-
> - pass it as an argument to a CommandFunction from the client to a server
> using onServer
> - execute it on the server
> - return a Response
>
> One use case for invoking a function from another function is member
> notification. This can be done with a CacheListener on a replicated region
> too, but the basic idea is:
>
> - invoke a function
> - in the function, invoke another function on all the members notifying
> them something is about to happen
> - do the thing
> - invoke another function on all the members notifying them something has
> happened
>
> You need to be careful when invoking one function from another. Depending
> on what you're doing in the second function, you could get yourself into a
> distributed deadlock situation.
>
> I'm not sure this answers all the issues you were seeing, but hopefully it
> helps.
>
> Thanks,
> Barry Oglesby
>
>
> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mross@pivotal.io
> <javascript:_e(%7B%7D,'cvml','mross@pivotal.io');>> wrote:
>
>> Hi all,
>>
>> I'm involved in a sizable GemFire Project right now that is requiring me
>> to execute Functions in a number of ways, and I wanted to poll the
>> community for some best practices.  So initially I would execute all
>> functions like this.
>>
>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>     .withArgs(arguments).execute("my-awesome-function");
>>
>> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>>
>> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>>
>> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>>
>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>
>> myAweSomeFunction.execute(functionContext);
>>
>> This worked for a time but complexity started rising and more problems came up.
>>
>> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>>
>> *Matthew Ross | Data Engineer | Pivotal*
>> *625 Avenue of the Americas NY, NY 10011*
>> *516-941-7535 <516-941-7535> | mross@pivotal.io
>> <javascript:_e(%7B%7D,'cvml','mross@pivotal.io');> *
>>
>>
>

-- 
*Matthew Ross | Data Engineer | Pivotal*
*625 Avenue of the Americas NY, NY 10011*
*516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *

Re: Best Practices for Calling Server Side Functions

Posted by Barry Oglesby <bo...@pivotal.io>.

Executing queries in functions can be tricky.

For executing queries in a function, do something like:

- invoke the function with onRegion
- have the function return true from optimizeForWrite so that it is
executed only on primary buckets
- use the Query execute API with a RegionFunctionContext in the function.
Otherwise, you could easily end up executing the same query on more than
one member.

If you set a filter, the function (and query) will execute on only the
member containing the primary or primaries for that filter.

Here is an example with trades.

If you route all trades on a specific cusip to the same bucket using a
PartitionResolver, then querying for all trades for a specific cusip can be
done efficiently using a Function. The trades could be stored with a simple
String key like cusip-id or a complex key containing both the cusip and id.
Either way, the PartitionResolver will need to be able to return the cusip
for the routing object.

Invoke the function like:

Execution execution =
FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
ResultCollector collector = execution.execute("TradeQueryFunction");
Object result = collector.getResult();

In the TradeQueryFunction, execute the query like:

RegionFunctionContext rfc = (RegionFunctionContext) context;
String cusip = (String) rfc.getFilter().iterator().next();
SelectResults results = (SelectResults) this.query.execute(rfc, new
String[] {cusip});

Where the query is:

select * from /Trade where cusip = $1

This will route the function request to the member whose primary bucket
contains the cusip filter. Then it will execute the query on the
RegionFunctionContext which will just be the data for that bucket. Note:
the PartitionResolver will also need to be able to return the cusip for
that filter (which is just the input string itself).

Here is a some more general info on functions.

If you're executing a function onRegion with a replicated region, then the
function is executed on any member defining that region. Since the region
is replicated, every server has the same data.

If you're executing a function onRegion with a partitioned region, then
where the function is invoked depends on the result of optimizeForWrite. If
optimizeForWrite returns true, the function is invoked on all the members
containing primary buckets for that region. If optimizeForWrite returns
false, the function is invoked on as few members as it can that encompass
all the buckets (so it mixes primary and secondary buckets). For example if
you have 2 members, and the primaries are split between them, then
optimizeForWrite returning true means that the function will be invoked on
both members. Returning false will cause the function to be invoked on only
one member since each member has all the buckets. I almost always have
optimizeForWrite return true.

The onServer/onServers API is used for data-unaware calls (meaning no
specific region involved). In the past, I've used it mainly for admin-type
behavior like:

- start/stop gateway senders
- create regions
- rebalance
- assign buckets

Now, gfsh does a lot of this behavior (maybe all of it), so I don't
necessarily need functions to do it anymore.

One of my favorite onServer use cases is the command pattern using a
Request/Response API like:

- define a Request (like RebalanceCache)-
- pass it as an argument to a CommandFunction from the client to a server
using onServer
- execute it on the server
- return a Response

One use case for invoking a function from another function is member
notification. This can be done with a CacheListener on a replicated region
too, but the basic idea is:

- invoke a function
- in the function, invoke another function on all the members notifying
them something is about to happen
- do the thing
- invoke another function on all the members notifying them something has
happened

You need to be careful when invoking one function from another. Depending
on what you're doing in the second function, you could get yourself into a
distributed deadlock situation.

I'm not sure this answers all the issues you were seeing, but hopefully it
helps.

Thanks,
Barry Oglesby

On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mr...@pivotal.io> wrote:

> Hi all,
>
> I'm involved in a sizable GemFire Project right now that is requiring me
> to execute Functions in a number of ways, and I wanted to poll the
> community for some best practices.  So initially I would execute all
> functions like this.
>
> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>     .withArgs(arguments).execute("my-awesome-function");
>
> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>
> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>
> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>
> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>
> myAweSomeFunction.execute(functionContext);
>
> This worked for a time but complexity started rising and more problems came up.
>
> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>
> *Matthew Ross | Data Engineer | Pivotal*
> *625 Avenue of the Americas NY, NY 10011*
> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>
>

Re: Best Practices for Calling Server Side Functions

Posted by Michael Stolz <ms...@pivotal.io>.

Well, the onRegion calls won't isolate to just one or fewer than all
members hosting the region unless you supply a withFilter(filter) clause.
Then they will run only on the members hosting the data called out by the
filter.

I don't think there is a best-practices document on this subject.

I'd be happy to discuss your usage pattern any time next week. I see you're
New York based, as am I.




--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: 631-835-4771

On Fri, Apr 15, 2016 at 4:36 PM, Matt Ross <mr...@pivotal.io> wrote:

> Hi all,
>
> I'm involved in a sizable GemFire Project right now that is requiring me
> to execute Functions in a number of ways, and I wanted to poll the
> community for some best practices.  So initially I would execute all
> functions like this.
>
> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>     .withArgs(arguments).execute("my-awesome-function");
>
> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>
> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>
> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>
> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>
> myAweSomeFunction.execute(functionContext);
>
> This worked for a time but complexity started rising and more problems came up.
>
> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>
> *Matthew Ross | Data Engineer | Pivotal*
> *625 Avenue of the Americas NY, NY 10011*
> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>
>

Re: Best Practices for Calling Server Side Functions

Posted by Dan Smith <ds...@pivotal.io>.

FunctionService.onRegion is preferable to onPool or onServer if you
actually want to work with data in a region. The key is that within the
function, you want to make sure you actually use the data set for that
function context like this:

Region localData = PartitionRegionHelper.getLocalDataForContext(context)

Do not use PartitionRegionHelper.getLocalData(Region). The function service
divides up the buckets between different members and that information is
contained in the context. If you don't use the context like above, you may
end up covering the same buckets multiple times.

Another good option to consider is Spring Data Gemfire's function support.
SDG lets you annotate plain java methods and use them as functions. That
would help if you have a method that you want to execute as a geode
function and directly from some other java code, because you don't need to
worry about constructing a function context.
http://docs.spring.io/spring-data-gemfire/docs/current/reference/html/#function-annotations

-Dan

On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <mr...@pivotal.io> wrote:

> Hi all,
>
> I'm involved in a sizable GemFire Project right now that is requiring me
> to execute Functions in a number of ways, and I wanted to poll the
> community for some best practices.  So initially I would execute all
> functions like this.
>
> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>     .withArgs(arguments).execute("my-awesome-function");
>
> And this worked reliably for quite some time, until I started mixing up functions that were executing on partition redundant data and replicated data.  I initially started having problems with this method when I had this setup.
>
> 1 locator, 2 servers,  and executing functions that would run queries on partition redundant and replicated regions.  I started getting this problem where the function would execute on both servers, and the result collector would indeterminately chose a server to return results from.  According to logging statements placed within my function I was able to confirm that the function was being executed twice, on both servers.  We were able to fix this problem by switching from executing on region, to executing on Pool.  The initial logic being since there was replicated data on both servers, the function would execute on both servers(Hyptothesis).
>
> Another issue was executing functions from within a function without a function context.  Let's say I have one function that I execute with on Pool, there for it is passed a Function Context.  But when I'm actually in the function I need to execute other functions, some needing a RegionFunctionContext and some just needing a FunctionContext.  Initially I was able to just use a Result Collector and FunctionService.onRegion to get a region context, and then pass my current function context to an instance of a new function
>
> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>
> myAweSomeFunction.execute(functionContext);
>
> This worked for a time but complexity started rising and more problems came up.
>
> So in short I wanted to throw out the blanket question of best practices on using (onRegion/onPool/onServer), calling other functions from within functions, what type of functions should be used on what type of regions, and general design patterns when executing functions.  Thanks!
>
> *Matthew Ross | Data Engineer | Pivotal*
> *625 Avenue of the Americas NY, NY 10011*
> *516-941-7535 <516-941-7535> | mross@pivotal.io <mr...@pivotal.io> *
>
>