You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by Kristopher Kane <kr...@gmail.com> on 2015/05/28 15:34:56 UTC

New service definitions with non-standard topology role hostnames.

All,

I'm implementing a Solr Cloud proxy in Knox and need help understanding how
to implement topology roles that use a late binding 'discovery' service to
derive the destination host.

The implementation uses gateway-service-definitions and a custom dispatch
in a new module called gateway-service-solrcloud.

Here are the two rewrite rules as a reference - which are just copies of
WebHCat's:

<rules>

<rule dir="IN" name="SOLRCLOUD/solrcloud/root/inbound"
pattern="*://*:*/**/solr/?{**}">
<rewrite template="{$serviceUrl[SOLRCLOUD]}/?{**}"/>
</rule>

<rule dir="IN" name="SOLRCLOUD/solrcloud/path/inbound"
pattern="*://*:*/**/solr/{path=**}?{**}">
<rewrite template="{$serviceUrl[SOLRCLOUD]}/{path=**}?{**}"/>
</rule>

</rules>

Right now my topology file includes a single Solr server hostname for the
SOLRCLOUD role that acts as a dummy place holder.  The custom dispatch
queries Zookeeper for Solr hosts, rewrites the outboundRequest and sends it
off to an active Solr host.  The ZK host is hard coded in the dispatch at
this point and it is all working fine.

The next step is to put a comma separated list of zk_hostname:port in the
topology file for the SOLRCLOUD role, in place of any known Solr hosts.

I'm not sure what my rewrite looks like with my intentions and really
assume that I will need a new provider that understands that it isn't
simply rewriting a URI, but rather triggering the query to ZK. Perhaps I
can inject that process first and then tell Knox to rewrite with the URI I
get back from ZK. From the WebHDFS HA dispatch code, I learned that you can
trigger the rewrite rules to run again with something like:

//null out target url so that rewriters run again
inboundRequest.setAttribute(AbstractGatewayFilter.TARGET_REQUEST_URL_ATTRIBUTE_NAME,
null);

It looks like I have access to everything I need in
DefaultDispatch.executeRequest(...) to make the complete query myself:

//Knox path info to omit
2015-05-28 01:52:52,180 DEBUG hadoop.gateway
(SolrCloudDispatch.java:executeRequest(51)) - inboundRequest Attribute
getContextPath() is /gateway/sandbox
//starts the important service info here
2015-05-28 01:52:52,180 DEBUG hadoop.gateway
(SolrCloudDispatch.java:executeRequest(52)) - inboundRequest Attribute
getPathInfo() is /solr/gettingstarted/select

If a query exists then append the query: '?'+getQueryString()


So...my questions are: How do I put my Zookeeper server list in the
topology file and tell the whole process to forget about rewriting and let
me do it in the dispatch?  Or is that the wrong approach?


Thanks,

Kris

Re: New service definitions with non-standard topology role hostnames.

Posted by Kristopher Kane <kr...@gmail.com>.
Thanks Sumit.  I will take the HAProvider as a template and create a new
one.

Kris

On Tue, Jun 16, 2015 at 2:49 PM, Sumit Gupta <su...@hortonworks.com>
wrote:

> I believe it is these two lines:
>
> inboundRequest.setAttribute(AbstractGatewayFilter.TARGET_REQUEST_URL_ATTRIB
> UTE_NAME, null);
> URI uri = getDispatchUrl(inboundRequest);
>
>
> But it has been a while and I remember spending some time in the debugger
> to figure out how to connect the dots.
>
> Sumit
>
> On 6/16/15, 12:51 PM, "Kristopher Kane" <kk...@hortonworks.com> wrote:
>
> >Sumit,
> >
> >Haven't determined if the HA provider will do what I need yet but it
> >isn't looking that way.
> >
> >I am unable to figure out what in failoverRequest links back to the
> >service registry lookupServiceUrl() could you point it out to me?
> >
> >Thanks,
> >
> >Kris
> >
> >From: Sumit Gupta
> ><su...@hortonworks.com>>
> >Reply-To: "dev@knox.apache.org<ma...@knox.apache.org>"
> ><de...@knox.apache.org>>
> >Date: Tuesday, June 16, 2015 at 10:43 AM
> >To: "dev@knox.apache.org<ma...@knox.apache.org>"
> ><de...@knox.apache.org>>
> >Subject: Re: New service definitions with non-standard topology role
> >hostnames.
> >
> >Hey Kris,
> >
> >The class WebHdfsHaDispatch and method failoverRequest has some code in
> >there that causes the method lookupServiceUrl() to be called again.
> >
> >The HA Provider mainly manages the services that have HA capabilities
> >using the HA configuration and the service registry's URL information.
> >There are thoughts I have around enhancing the service registry to have
> >some more the multiple url management capabilities, but for now hopefully
> >the HA provider can do what you need. Are you looking to add a special
> >dispatch and use the HA provider or add a provider as well for Solr?
> >
> >Sumit.
> >
> >On 6/16/15, 11:04 AM, "Kristopher Kane"
> ><kr...@gmail.com>> wrote:
> >
> >Kevin,
> >
> >ServiceRegistryFunctionProcessorBase.lookupServiceUrl()  and subclasses -
> >Just to verify, these are added at Knox startup to register the end point
> >URL with the role name.  Is lookupServiceUrl() only called at startup and
> >the HA Provider there is simply to provide a starting point URL for HA
> >services?
> >
> >I was thinking to add the Solr provider there but want to ensure that
> >these
> >checks are not happening with each call.  I don't believe they are as the
> >WebHDFS HA provider is doing work elsewhere.
> >
> >Kris
> >
> >On Thu, May 28, 2015 at 10:21 PM, Kristopher Kane
> ><kr...@gmail.com>
> >wrote:
> >
> >Thanks for the detailed response Kevin. I will give it a shot tonight.
> >
> >NowŠ The outstanding question for me is the list is Zookeeper URLs.  We
> >may want to treat them like we do the NAMENODE and JOBTRACKER services
> >today which are in the topology (and therefore in the service registry)
> >but not really exposed.
> >
> >
> >That was my intention.
> >
> >Thanks,
> >Kris
> >
> >
> >
> >
> >
>
>

Re: New service definitions with non-standard topology role hostnames.

Posted by Sumit Gupta <su...@hortonworks.com>.
I believe it is these two lines:

inboundRequest.setAttribute(AbstractGatewayFilter.TARGET_REQUEST_URL_ATTRIB
UTE_NAME, null);
URI uri = getDispatchUrl(inboundRequest);


But it has been a while and I remember spending some time in the debugger
to figure out how to connect the dots.

Sumit

On 6/16/15, 12:51 PM, "Kristopher Kane" <kk...@hortonworks.com> wrote:

>Sumit,
>
>Haven't determined if the HA provider will do what I need yet but it
>isn't looking that way.
>
>I am unable to figure out what in failoverRequest links back to the
>service registry lookupServiceUrl() could you point it out to me?
>
>Thanks,
>
>Kris
>
>From: Sumit Gupta 
><su...@hortonworks.com>>
>Reply-To: "dev@knox.apache.org<ma...@knox.apache.org>"
><de...@knox.apache.org>>
>Date: Tuesday, June 16, 2015 at 10:43 AM
>To: "dev@knox.apache.org<ma...@knox.apache.org>"
><de...@knox.apache.org>>
>Subject: Re: New service definitions with non-standard topology role
>hostnames.
>
>Hey Kris,
>
>The class WebHdfsHaDispatch and method failoverRequest has some code in
>there that causes the method lookupServiceUrl() to be called again.
>
>The HA Provider mainly manages the services that have HA capabilities
>using the HA configuration and the service registry's URL information.
>There are thoughts I have around enhancing the service registry to have
>some more the multiple url management capabilities, but for now hopefully
>the HA provider can do what you need. Are you looking to add a special
>dispatch and use the HA provider or add a provider as well for Solr?
>
>Sumit.
>
>On 6/16/15, 11:04 AM, "Kristopher Kane"
><kr...@gmail.com>> wrote:
>
>Kevin,
>
>ServiceRegistryFunctionProcessorBase.lookupServiceUrl()  and subclasses -
>Just to verify, these are added at Knox startup to register the end point
>URL with the role name.  Is lookupServiceUrl() only called at startup and
>the HA Provider there is simply to provide a starting point URL for HA
>services?
>
>I was thinking to add the Solr provider there but want to ensure that
>these
>checks are not happening with each call.  I don't believe they are as the
>WebHDFS HA provider is doing work elsewhere.
>
>Kris
>
>On Thu, May 28, 2015 at 10:21 PM, Kristopher Kane
><kr...@gmail.com>
>wrote:
>
>Thanks for the detailed response Kevin. I will give it a shot tonight.
>
>NowŠ The outstanding question for me is the list is Zookeeper URLs.  We
>may want to treat them like we do the NAMENODE and JOBTRACKER services
>today which are in the topology (and therefore in the service registry)
>but not really exposed.
>
>
>That was my intention.
>
>Thanks,
>Kris
>
>
>
>
>


Re: New service definitions with non-standard topology role hostnames.

Posted by Kristopher Kane <kk...@hortonworks.com>.
Sumit,

Haven't determined if the HA provider will do what I need yet but it isn't looking that way.

I am unable to figure out what in failoverRequest links back to the service registry lookupServiceUrl() could you point it out to me?

Thanks,

Kris

From: Sumit Gupta <su...@hortonworks.com>>
Reply-To: "dev@knox.apache.org<ma...@knox.apache.org>" <de...@knox.apache.org>>
Date: Tuesday, June 16, 2015 at 10:43 AM
To: "dev@knox.apache.org<ma...@knox.apache.org>" <de...@knox.apache.org>>
Subject: Re: New service definitions with non-standard topology role hostnames.

Hey Kris,

The class WebHdfsHaDispatch and method failoverRequest has some code in
there that causes the method lookupServiceUrl() to be called again.

The HA Provider mainly manages the services that have HA capabilities
using the HA configuration and the service registry's URL information.
There are thoughts I have around enhancing the service registry to have
some more the multiple url management capabilities, but for now hopefully
the HA provider can do what you need. Are you looking to add a special
dispatch and use the HA provider or add a provider as well for Solr?

Sumit.

On 6/16/15, 11:04 AM, "Kristopher Kane" <kr...@gmail.com>> wrote:

Kevin,

ServiceRegistryFunctionProcessorBase.lookupServiceUrl()  and subclasses -
Just to verify, these are added at Knox startup to register the end point
URL with the role name.  Is lookupServiceUrl() only called at startup and
the HA Provider there is simply to provide a starting point URL for HA
services?

I was thinking to add the Solr provider there but want to ensure that
these
checks are not happening with each call.  I don't believe they are as the
WebHDFS HA provider is doing work elsewhere.

Kris

On Thu, May 28, 2015 at 10:21 PM, Kristopher Kane
<kr...@gmail.com>
wrote:

Thanks for the detailed response Kevin. I will give it a shot tonight.

NowŠ The outstanding question for me is the list is Zookeeper URLs.  We
may want to treat them like we do the NAMENODE and JOBTRACKER services
today which are in the topology (and therefore in the service registry)
but not really exposed.


That was my intention.

Thanks,
Kris






Re: New service definitions with non-standard topology role hostnames.

Posted by Sumit Gupta <su...@hortonworks.com>.
Hey Kris,

The class WebHdfsHaDispatch and method failoverRequest has some code in
there that causes the method lookupServiceUrl() to be called again.

The HA Provider mainly manages the services that have HA capabilities
using the HA configuration and the service registry's URL information.
There are thoughts I have around enhancing the service registry to have
some more the multiple url management capabilities, but for now hopefully
the HA provider can do what you need. Are you looking to add a special
dispatch and use the HA provider or add a provider as well for Solr?

Sumit.

On 6/16/15, 11:04 AM, "Kristopher Kane" <kr...@gmail.com> wrote:

>Kevin,
>
>ServiceRegistryFunctionProcessorBase.lookupServiceUrl()  and subclasses -
>Just to verify, these are added at Knox startup to register the end point
>URL with the role name.  Is lookupServiceUrl() only called at startup and
>the HA Provider there is simply to provide a starting point URL for HA
>services?
>
>I was thinking to add the Solr provider there but want to ensure that
>these
>checks are not happening with each call.  I don't believe they are as the
>WebHDFS HA provider is doing work elsewhere.
>
>Kris
>
>On Thu, May 28, 2015 at 10:21 PM, Kristopher Kane
><kristopher.kane@gmail.com
>> wrote:
>
>> Thanks for the detailed response Kevin. I will give it a shot tonight.
>>
>> NowŠ The outstanding question for me is the list is Zookeeper URLs.  We
>>> may want to treat them like we do the NAMENODE and JOBTRACKER services
>>> today which are in the topology (and therefore in the service registry)
>>> but not really exposed.
>>>
>>
>> That was my intention.
>>
>> Thanks,
>> Kris
>>
>>
>>


Re: New service definitions with non-standard topology role hostnames.

Posted by Kristopher Kane <kr...@gmail.com>.
Kevin,

ServiceRegistryFunctionProcessorBase.lookupServiceUrl()  and subclasses -
Just to verify, these are added at Knox startup to register the end point
URL with the role name.  Is lookupServiceUrl() only called at startup and
the HA Provider there is simply to provide a starting point URL for HA
services?

I was thinking to add the Solr provider there but want to ensure that these
checks are not happening with each call.  I don't believe they are as the
WebHDFS HA provider is doing work elsewhere.

Kris

On Thu, May 28, 2015 at 10:21 PM, Kristopher Kane <kristopher.kane@gmail.com
> wrote:

> Thanks for the detailed response Kevin. I will give it a shot tonight.
>
> NowŠ The outstanding question for me is the list is Zookeeper URLs.  We
>> may want to treat them like we do the NAMENODE and JOBTRACKER services
>> today which are in the topology (and therefore in the service registry)
>> but not really exposed.
>>
>
> That was my intention.
>
> Thanks,
> Kris
>
>
>

Re: New service definitions with non-standard topology role hostnames.

Posted by Kristopher Kane <kr...@gmail.com>.
Thanks for the detailed response Kevin. I will give it a shot tonight.

NowŠ The outstanding question for me is the list is Zookeeper URLs.  We
> may want to treat them like we do the NAMENODE and JOBTRACKER services
> today which are in the topology (and therefore in the service registry)
> but not really exposed.
>

That was my intention.

Thanks,
Kris

Re: New service definitions with non-standard topology role hostnames.

Posted by Kevin Minder <ke...@hortonworks.com>.
Hey Kris,

Well I can try and relay the ³vision².  The ideal ³vision² would have been
to have the dispatch be able to communicate with the framework so that
$serviceUrl[SOLRCLOUD] in your rewrite rules would do the right thing.  We
certainly aren¹t there yet.  So the next best thing from my perspective
would be to implement a custom solrcloud rewrite function.  Now some
pointers that will hopefully show where we are and layout a pattern.  Keep
in mind that these would all be implemented in your
gateway-service-solrcloud module.

gateway-provider-rewrite-func-service-registry/src/main/java/org/apache/had
oop/gateway/svcregfunc/api/ServiceUrlFunctionDescriptor.java
Here you will see how do ³declare² a rewrite function.

gateway-provider-rewrite-func-service-registry/src/main/java/org/apache/had
oop/gateway/svcregfunc/impl/ServiceUrlFunctionProcessor.java
Here you will see the implementation of the current ³servicUrl² rewrite
function.  The most important part is that call to lookupServiceUrl.  If
you dig through enough you will see how this is hooked up to the current
HA stuff.  In particular note how WebHdfsHDispatch uses HaProvider to
ultimately interact with ServicRegistryFunctionProcessorBase.  You may be
able to use HaProvider actually.


gateway-provider-rewrite-func-service-registry/src/main/java/org/apache/had
oop/gateway/svcregfunc/impl/ServiceRegistryFunctionProcessorBase.java
This is where lookupServiceUrl is implemented.  Note that in the
initialize method the provided environment.  The environment.getAttribute
basically boils down to a ServletContext.getAttribute call so you can use
this to share state with your dispatch impl.

gateway-provider-rewrite-func-service-registry/src/main/resources/META-INF/
services/org.apache.hadoop.gateway.filter.rewrite.api.UrlRewriteFunctionDes
criptor
gateway-provider-rewrite-func-service-registry/src/main/resources/META-INF/
services/org.apache.hadoop.gateway.filter.rewrite.spi.UrlRewriteFunctionPro
cessor
Like everything else in Knox we use service loaders to find things so you
module will need files like these for the rewrite system to find your
rewrite function.

NowŠ The outstanding question for me is the list is Zookeeper URLs.  We
may want to treat them like we do the NAMENODE and JOBTRACKER services
today which are in the topology (and therefore in the service registry)
but not really exposed.

Kevin.

On 5/28/15, 9:34 AM, "Kristopher Kane" <kr...@gmail.com> wrote:

>All,
>
>I'm implementing a Solr Cloud proxy in Knox and need help understanding
>how
>to implement topology roles that use a late binding 'discovery' service to
>derive the destination host.
>
>The implementation uses gateway-service-definitions and a custom dispatch
>in a new module called gateway-service-solrcloud.
>
>Here are the two rewrite rules as a reference - which are just copies of
>WebHCat's:
>
><rules>
>
><rule dir="IN" name="SOLRCLOUD/solrcloud/root/inbound"
>pattern="*://*:*/**/solr/?{**}">
><rewrite template="{$serviceUrl[SOLRCLOUD]}/?{**}"/>
></rule>
>
><rule dir="IN" name="SOLRCLOUD/solrcloud/path/inbound"
>pattern="*://*:*/**/solr/{path=**}?{**}">
><rewrite template="{$serviceUrl[SOLRCLOUD]}/{path=**}?{**}"/>
></rule>
>
></rules>
>
>Right now my topology file includes a single Solr server hostname for the
>SOLRCLOUD role that acts as a dummy place holder.  The custom dispatch
>queries Zookeeper for Solr hosts, rewrites the outboundRequest and sends
>it
>off to an active Solr host.  The ZK host is hard coded in the dispatch at
>this point and it is all working fine.
>
>The next step is to put a comma separated list of zk_hostname:port in the
>topology file for the SOLRCLOUD role, in place of any known Solr hosts.
>
>I'm not sure what my rewrite looks like with my intentions and really
>assume that I will need a new provider that understands that it isn't
>simply rewriting a URI, but rather triggering the query to ZK. Perhaps I
>can inject that process first and then tell Knox to rewrite with the URI I
>get back from ZK. From the WebHDFS HA dispatch code, I learned that you
>can
>trigger the rewrite rules to run again with something like:
>
>//null out target url so that rewriters run again
>inboundRequest.setAttribute(AbstractGatewayFilter.TARGET_REQUEST_URL_ATTRI
>BUTE_NAME,
>null);
>
>It looks like I have access to everything I need in
>DefaultDispatch.executeRequest(...) to make the complete query myself:
>
>//Knox path info to omit
>2015-05-28 01:52:52,180 DEBUG hadoop.gateway
>(SolrCloudDispatch.java:executeRequest(51)) - inboundRequest Attribute
>getContextPath() is /gateway/sandbox
>//starts the important service info here
>2015-05-28 01:52:52,180 DEBUG hadoop.gateway
>(SolrCloudDispatch.java:executeRequest(52)) - inboundRequest Attribute
>getPathInfo() is /solr/gettingstarted/select
>
>If a query exists then append the query: '?'+getQueryString()
>
>
>So...my questions are: How do I put my Zookeeper server list in the
>topology file and tell the whole process to forget about rewriting and let
>me do it in the dispatch?  Or is that the wrong approach?
>
>
>Thanks,
>
>Kris