You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by Jean-Baptiste Note <jb...@gmail.com> on 2015/06/01 14:47:07 UTC

registry / export question

Hi there,

I've successfully exported some host/port dynamic combination in slider for
Kafka on Yarn; they are made available under
publisher/exports/servers on the appmaster (see
https://github.com/jbnote/koya/).

I'm now trying to access this information (really, service location) in two
different ways:

* From within slider. Is there a public API that I could use directly in
python from other slider instances to get to this information ? -- this is
necessary for spawning Kafka mirroring from slider, for instance. From what
I can see in storm-slider, the slider binary is directly invoked.

* From the rest of the world. I was thinking of exporting the data to DNS,
and hoped to do this with a zookeeper-monitoring daemon, which is already
partially implemented. However, none of my exported data seems to be
present in ZK, which I was naively hoping for. Is there something i'm
missing ? I find the ZK way perfect, rather than the REST API which as far
as I can see will require polling. In python monitoring ZK is a breeze.

Can someone familiar with the design intent shed some light on how I should
carryout this ?

Kind regards,
JB

Re: registry / export question

Posted by "hsy541@gmail.com" <hs...@gmail.com>.
Same as the koya project
The result of http://node1:1025/ws/v1/slider/publisher/exports
is
{

   - exports:
   {
      - servers:
      {
         - description: "servers",
         - updated: 1433177762880,
         - updatedTime: "Mon Jun 01 09:56:02 PDT 2015",
         - entries: { },
         - empty: true
         },
      - container_log_dirs:
      {
         - description: "container_log_dirs",
         - updated: 1433177762881,
         - updatedTime: "Mon Jun 01 09:56:02 PDT 2015",
         - entries: { },
         - empty: true
         },
      - container_work_dirs:
      {
         - description: "container_work_dirs",
         - updated: 1433177762881,
         - updatedTime: "Mon Jun 01 09:56:02 PDT 2015",
         - entries: { },
         - empty: true
         }
      }

}

As you can see all entries are empty, and you have to manually attach the
name to get the value
For example you have to call
http://node1:1025/ws/v1/slider/publisher/exports/servers to get
{

   - description: "servers",
   - updated: 1433177762880,
   - updatedTime: "Mon Jun 01 09:56:02 PDT 2015",
   - entries:
   {
      - org.apache.kafka.broker:
      [
         -
         {
            - value: "node23:56579",
            - containerId: "container_1432861905779_0057_01_000004",
            - tag: "2",
            - level: "component",
            - updatedTime: "Mon Jun 01 09:56:02 PDT 2015"
            },
         -
         {
            - value: "node20:44118",
            - containerId: "container_1432861905779_0057_01_000002",
            - tag: "1",
            - level: "component",
            - updatedTime: "Mon Jun 01 09:56:02 PDT 2015"
            },
         -
         {
            - value: "node24:34387",
            - containerId: "container_1432861905779_0057_01_000003",
            - tag: "3",
            - level: "component",
            - updatedTime: "Mon Jun 01 09:56:02 PDT 2015"
            }
         ]
      },
   - empty: false

}

It's not a bug, but it's nice to have everything in
*ws/v1/slider/publisher/exports
*so that user can just view the value directly from AM UI

On Tue, Jun 2, 2015 at 9:40 AM, Gour Saha <gs...@hortonworks.com> wrote:

> It is REST style uri, so if you append the uri path with the export group
> name you will get the info you are looking for.
>
> If that does not answer your question, can you give an example response
> that you are expecting to see?
>
> -Gour
>
> > On Jun 2, 2015, at 8:33 AM, "hsy541@gmail.com" <hs...@gmail.com> wrote:
> >
> > I've noticed the http://hostname/ws/v1/slider/publisher/exports/
> > <http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports/>
> only
> > gives you the list of export values, but within each one the entries
> block
> > are empty. Is it ok have them all embedded in one response so that you
> can
> > get all information directly from slider AM UI.
> >
> > On Mon, Jun 1, 2015 at 12:35 PM, Jean-Baptiste Note <jb...@gmail.com>
> > wrote:
> >
> >> Thanks Gour,
> >>
> >> Indeed it does help; because I can see a way to combine these to avoid
> >> polling.
> >> By monitoring the ZK registry and doing CURL whenever there's a child
> >> change in the registry it looks I can reliably track changes in the
> export
> >> group, so this is perfect.
> >> I'll let you know how implementation goes :)
> >>
> >> Kind regards,
> >> JB
> >>
>

Re: registry / export question

Posted by Gour Saha <gs...@hortonworks.com>.
It is REST style uri, so if you append the uri path with the export group name you will get the info you are looking for. 

If that does not answer your question, can you give an example response that you are expecting to see?

-Gour

> On Jun 2, 2015, at 8:33 AM, "hsy541@gmail.com" <hs...@gmail.com> wrote:
> 
> I've noticed the http://hostname/ws/v1/slider/publisher/exports/
> <http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports/>  only
> gives you the list of export values, but within each one the entries block
> are empty. Is it ok have them all embedded in one response so that you can
> get all information directly from slider AM UI.
> 
> On Mon, Jun 1, 2015 at 12:35 PM, Jean-Baptiste Note <jb...@gmail.com>
> wrote:
> 
>> Thanks Gour,
>> 
>> Indeed it does help; because I can see a way to combine these to avoid
>> polling.
>> By monitoring the ZK registry and doing CURL whenever there's a child
>> change in the registry it looks I can reliably track changes in the export
>> group, so this is perfect.
>> I'll let you know how implementation goes :)
>> 
>> Kind regards,
>> JB
>> 

Re: registry / export question

Posted by "hsy541@gmail.com" <hs...@gmail.com>.
I've noticed the http://hostname/ws/v1/slider/publisher/exports/
<http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports/>  only
gives you the list of export values, but within each one the entries block
are empty. Is it ok have them all embedded in one response so that you can
get all information directly from slider AM UI.

On Mon, Jun 1, 2015 at 12:35 PM, Jean-Baptiste Note <jb...@gmail.com>
wrote:

> Thanks Gour,
>
> Indeed it does help; because I can see a way to combine these to avoid
> polling.
> By monitoring the ZK registry and doing CURL whenever there's a child
> change in the registry it looks I can reliably track changes in the export
> group, so this is perfect.
> I'll let you know how implementation goes :)
>
> Kind regards,
> JB
>

Re: registry / export question

Posted by Jean-Baptiste Note <jb...@gmail.com>.
Thanks Gour,

Indeed it does help; because I can see a way to combine these to avoid
polling.
By monitoring the ZK registry and doing CURL whenever there's a child
change in the registry it looks I can reliably track changes in the export
group, so this is perfect.
I'll let you know how implementation goes :)

Kind regards,
JB

Re: registry / export question

Posted by Gour Saha <gs...@hortonworks.com>.
There is a combination of ZK and REST way to find the info you are looking for.

Use a zk client and do this -

get /registry/users/<user_id>/services/org-apache-slider/<app_id>

(with appropriate <user_id> and <app_id> of the koya cluster)

>From the json dump look for element with api = "class path:org.apache.slider.publisher.exports" under "external" element. Get the value of "addresses"->"uri" e.g.: http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports

Then you can do -
curl "http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports/<export_group_name>"

e.g.
curl "http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports/servers"

Does this help?

Check https://issues.apache.org/jira/browse/SLIDER-151 and https://issues.apache.org/jira/browse/YARN-913 for few things to look out for, in the future.

-Gour

On 6/1/15, 5:47 AM, "Jean-Baptiste Note" <jb...@gmail.com>> wrote:

Hi there,

I've successfully exported some host/port dynamic combination in slider for
Kafka on Yarn; they are made available under
publisher/exports/servers on the appmaster (see
https://github.com/jbnote/koya/).

I'm now trying to access this information (really, service location) in two
different ways:

* From within slider. Is there a public API that I could use directly in
python from other slider instances to get to this information ? -- this is
necessary for spawning Kafka mirroring from slider, for instance. From what
I can see in storm-slider, the slider binary is directly invoked.

* From the rest of the world. I was thinking of exporting the data to DNS,
and hoped to do this with a zookeeper-monitoring daemon, which is already
partially implemented. However, none of my exported data seems to be
present in ZK, which I was naively hoping for. Is there something i'm
missing ? I find the ZK way perfect, rather than the REST API which as far
as I can see will require polling. In python monitoring ZK is a breeze.

Can someone familiar with the design intent shed some light on how I should
carryout this ?

Kind regards,
JB


Re: registry / export question

Posted by Steve Loughran <st...@hortonworks.com>.
On 1 Jun 2015, at 13:47, Jean-Baptiste Note <jb...@gmail.com>> wrote:

Hi there,

I've successfully exported some host/port dynamic combination in slider for
Kafka on Yarn; they are made available under
publisher/exports/servers on the appmaster (see
https://github.com/jbnote/koya/).

I'm now trying to access this information (really, service location) in two
different ways:

* From within slider. Is there a public API that I could use directly in
python from other slider instances to get to this information ? -- this is
necessary for spawning Kafka mirroring from slider, for instance. From what
I can see in storm-slider, the slider binary is directly invoked.


The code to look up entries is is in the hadoop-yarn-registry API; shipping in Hadoop 2.6


* From the rest of the world. I was thinking of exporting the data to DNS,
and hoped to do this with a zookeeper-monitoring daemon, which is already
partially implemented. However, none of my exported data seems to be
present in ZK, which I was naively hoping for. Is there something i'm
missing ? I find the ZK way perfect, rather than the REST API which as far
as I can see will require polling. In python monitoring ZK is a breeze.

Can someone familiar with the design intent shed some light on how I should
carryout this ?

YARN-913 is the registry design;
its documented in http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/index.html i

  1.  everything is (publicly) published to ZK
  2.  There's an API ( http://hadoop.apache.org/docs/current/api/index.html ) in Java;
  3.  Slider has a .py client too.

It deliberately doesn't publish the full set of documents to the registry; too much data & too high a rate of change is what hits ZK scalability and performance.

Instead we have a slider-specific API for publishing sets of configurations, each configuration being served up as : json, xml, properties

look at org.apache.slider.server.appmaster.web.rest.publisher.PublisherResource for the specifics, but it essentially comes down to


GET configuration sets (JSON)
ws/v1/exports/

configuration files of a configuration set
GET ws/v1/exports/${configset}

retrieve a config
ws/v1/exports/${configset}/{configuration}.${suffix}

suffix = [xml|json|properties]

finally, get a specific property

ws/v1/exports/${configset}/{configuration}/${property}


regarding python monitoring, our code is in the slider-agent module. Bear in mind that ZK listening isn't that resilient to failures of ZK nodes. Our agent only checks it at startup and then starts polling after the AM fails.

The Hive LLAP team are using the YARN registry now, and want to add a TTL field to each entry, this would let the client know when to recheck.