You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by Abhishek Tiwari <ab...@apache.org> on 2018/04/02 23:45:53 UTC

Re: GAAS feedback.

Hi Vicky,

I had a follow-up with Sudarshan (cc'd), he will document his multi-hop
design and thoughts on the open source wiki after a bit of clean up.

Regards
Abhishek

On Mon, Mar 26, 2018 at 11:47 PM, Vicky Kak <vi...@gmail.com> wrote:

> Hi Abhishek,
>
> I did not get a change to followup on this and hence a delayed response,
> sorry about it. In our last meet there was a explanation about the Multi
> Hop, would it be possible to describe it here.
>
> Thanks,
> Vicky
>
> On Tue, Jan 23, 2018 at 5:14 AM, Abhishek Tiwari <ab...@apache.org> wrote:
>
>> Hi Vicky,
>>
>> Response inline but great suggestions and questions. Agree with each one,
>> please feel free to create Jiras for each.
>>
>> Also, apologies for the late reply.
>>
>> Regards,
>> Abhishek
>>
>> On Tue, Jan 9, 2018 at 3:24 AM, Vicky Kak <vi...@gmail.com> wrote:
>>
>> > Hi Guys,
>> >
>> > I have finally managed to install the GAAS with Standalone Cluster.
>> >
>> > Here are some of the observations to share
>> >
>> > 1) I have running the GAAS and Standalone cluster on the same machine
>> and
>> > from the same distribution, this will be typically needed for quick
>> setup.
>> > Since I have been starting the GAAS and Standalone master on same
>> > distribution,
>> > they both are directing the logs to the same master.out file leading to
>> > overlap of the logging details from the GAAS and standalone master. I
>> have
>> > changed the logging file from master.out to clustermaster.out on my
>> local
>> > set up by changing the $GOBBLIN_HOME/bin/gobblin-cluster-master.sh as
>> >
>> >
>> >    nohup $COMMAND >clustermaster.out 2>&1 & echo $! > $PID
>> >
>> >    We better make the changes in the distribution.
>> >
>> > I generally run two distributions at different locations to keep
>> workspace
>> / installation clean for each. But I see the advantage of using one
>> (attaching debugger with single IDE instance, etc), so it would be a good
>> idea to segregate the logging for both. We should create a Jira for this.
>>
>> >
>> >
>> > 2) The log4j logging configuration is dynamically controlled in the
>> > standalone/worker implementation, it does not work by default.I looked
>> at
>> > how the log4j configurations are being controlled in other modes, it is
>> > done via the bootstrap scripts e.g gobblin-aws.sh as
>> >   LOG4J_PATH=file://${FWDIR_CONF}/log4j-aws.properties
>> >   COMMAND="$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_FLAGS
>> gobblin.aws.GobblinAWSClusterLauncher
>> > -D log4j.configuration=$LOG4J_PATH"
>> >
>> > I see the log4j configurations similarly being configured in
>> > gobblin-standalone.sh too
>> >   COMMAND+="-Dlog4j.configuration=file://$FWDIR_CONF/log4j-
>> standalone.xml
>> > "
>> >
>> > I did made the similar changes for the gobblin-service.sh as
>> >
>> > LOG4J_PATH=file://${FWDIR_CONF}/log4j-cluster.properties
>> >   COMMAND="$JAVA_HOME/bin/java -Dlog4j.debug
>> -Dlog4j.configuration=$LOG4J_PATH
>> > -cp $CLASSPATH $JVM_FLAGS gobblin.service.modules.core.G
>> obblinServiceManager
>> > --service_name $SERVICE_NAME $LOG_ARGS"
>> >
>> > This was done because the log4j configuration for the GAAS which should
>> > have been taken from $GOBBLIN_HOME/conf/service/log
>> 4j-cluster.properties
>> > was not being taken from there, it was taken from the
>> > $GOBBLIN_HOME/lib/generator-2.6.0.jar.
>> >
>> > We should keep the consistent model of loading the log4j, for the
>> > standalone cluster the log4j configurations are being loaded via code
>> and
>> > for the other gobblin components(modes) it is via the configuration in
>> the
>> > bootstrap scripts. We should have it consistent and I think having it in
>> > the bootstrap scripts via -Dlog4j.configuration is good option.
>> >
>> >  I have to copy the log4j-cluster.properties into the GOBBLIN_HOME/bin
>> for
>> > running the Standalone cluster master/worker node.
>> > We need to fix these log4j configrations issues.
>> >
>> > Thanks, yes this should be made consistent.
>>
>> >
>> > 3) The Gobblin service should have rest port configurable via properties
>> > file, currently we get it from the property in the master.out log file.
>> I
>> > have to check how to get it using the d2 client as per the restli
>> > framework.
>> >
>> Yes, this is pending. Internally, we run within a wrapper jetty container
>> that has fixed port. So, this has slipped priority for anyone to address
>> so
>> far. Good reminder.
>>
>>
>> > 4) We need to have SQL based TopologyStore, i.e Implement pluggable
>> MySql
>> > based TopologyStore.
>> >
>> +1
>>
>> >
>> > 5) Capabilities are hardcoded into the configurations files. It would be
>> > good to have the capabilities configured in the corresponding job pull
>> file
>> > and it should propagate to the GASS when required.
>> >
>> Yes, thats where we intend to move towards. We started with static
>> configuration as v0, but should add a zk based registration or other
>> dynamic ways to announce and discover capabilities. I believe Sudarshan is
>> looking into multi-hop with a bit broader vision and might touch upon this
>> too.
>>
>> >
>> > 6) The Standalone master is not starting without configuring this
>> property
>> >
>> > gobblin.cluster.jobconf.fullyQualifiedPath
>> >
>> > Here is the exception that I see when it is not configured
>> >
>> > 2018-01-09 13:25:11 IST DEBUG [main] org.apache.hadoop.security.Use
>> rGroupInformation
>> > - UGI loginUser:vicky (auth:SIMPLE)
>> >
>> > Exception in thread "main" java.lang.NullPointerException: at index 2
>> >
>> > at com.google.common.collect.ObjectArrays.checkElementNotNull(O
>> bjectArrays.java:240)
>> >
>> >
>> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>> ObjectArrays.java:231)
>> >
>> >
>> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>> ObjectArrays.java:226)
>> >
>> >
>> > at com.google.common.collect.ImmutableList.construct(ImmutableL
>> ist.java:303)
>> >
>> >
>> > at com.google.common.collect.ImmutableList.of(ImmutableList.java:107)
>> >
>> > at gobblin.cluster.GobblinClusterManager.create(
>> > GobblinClusterManager.java:408)
>> >
>> > at gobblin.cluster.GobblinClusterManager.buildJobConfigurationManager(
>> > GobblinClusterManager.java:400)
>> >
>> > at gobblin.cluster.GobblinClusterManager.initializeAppLauncherAndServic
>> > es(GobblinClusterManager.java:198)
>> >
>> > at gobblin.cluster.GobblinClusterManager.<init>(
>> > GobblinClusterManager.java:164)
>> >
>> > at gobblin.cluster.GobblinClusterManager.main(GobblinClusterMan
>> ager.java:
>> > 743)
>> >
>> >
>> > Since the configuration looks for the job data from kafka queue, this
>> > following configurations need not to be done.
>> >
>> > gobblin.cluster.jobconf.fullyQualifiedPath=/home/
>> > vicky/development/gobblin/gobblin-dist-0.10.0/cluster-job-config-bpu1
>> >
>> > I am going to look into this again, not sure if I am missing anything.
>> >
>> Seems redundant, and like a bug.
>>
>> >
>> > Thanks,
>> > Vicky
>> >
>> >
>> >
>>
>
>

Re: GAAS feedback.

Posted by Vicky Kak <vi...@gmail.com>.
Hi Abhishek,

Yes I have plans to write the wiki about it. Please try it and let know if
you get it working with the wikipedia example.

Thanks,
Vicky


On Wed, Apr 25, 2018 at 4:08 PM, Abhishek Tiwari <ab...@apache.org> wrote:

> Hi Vicky,
>
> This is quite awesome work! I will try to get my hands wet with it over
> the weekend or so. Do you want to create a wiki page for this?
> Briefly looked at the commit diff and that looked fine too. I am looking
> forward to the PR.
>
> Sudarshan is working on updating his design doc to put it out for
> multi-hop.
>
> Abhishek
>
>
> On Tue, Apr 24, 2018 at 6:40 AM, Vicky Kak <vi...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> I have created multiple JIRA's based on the discussion we had in this
>> thread, these are
>> https://issues.apache.org/jira/browse/GOBBLIN-452
>> https://issues.apache.org/jira/browse/GOBBLIN-453
>>
>> The fix for these JIRA's are done in the patch here
>> https://github.com/dallaybatta/incubator-gobblin/commit/89a1
>> 6cc1f99497525eca2889f25c60cefed06d76
>>
>> I have created the binary distribution after making these changes here
>> https://github.com/dallaybatta/uploads/releases/download/v0.
>> 13.0/gobblin-distribution-0.13.0.tar.gz
>>
>> I also encountered the following issue while docerizing the GAAS (
>> https://github.com/dallaybatta/gaas-docker)
>> https://issues.apache.org/jira/browse/GOBBLIN-472
>>
>> The dockerization of GAAS does
>> 1) Create a docker containers for GAAS+Standard Cluster(single
>> master+single worker)
>> 2) It used the Zookeeper based statestore, I started with the hadoop but
>> had issues with it.
>> 3) The docker contains contains the wikipedia.template deployed.
>> 4) It has got a script that will create the gaas container group based on
>> unique id, this unique id  ( group id) can be associated with a unique user
>> and hence provide multi tenancy.
>>
>> Here is the sample of 3 GAAS container groups being created on the same
>> machine, demonstrating the multi tenancy
>> ************************************************************
>> ************************************************************
>> ************************************************************
>> *******************
>> CONTAINER ID        IMAGE                             COMMAND
>>       CREATED             STATUS              PORTS
>>       NAMES
>> c0be6bea1451        dallaybatta/gaas-cluster-worker   "/bin/sh -c
>> ./entry.s"   4 seconds ago       Up 3 seconds
>>              docker-gobblin-cluster-worker-3
>> c10644e81d5e        dallaybatta/gaas-cluster-master   "/bin/sh -c
>> ./entry.s"   10 seconds ago      Up 9 seconds
>>              docker-gobblin-cluster-master-3
>> 34fa6dd683c1        dallaybatta/gaas-service          "/bin/sh -c
>> ./entry.s"   10 seconds ago      Up 9 seconds        0.0.0.0:3->9099/tcp
>>               docker-gobblin-service-3
>> 66298f5b552e        ches/kafka                        "/start.sh"
>>       10 seconds ago      Up 9 seconds        7203/tcp, 0.0.0.0:6503->9092/tcp
>>  docker-dip-kafka-3
>> 4feb9200800a        jplock/zookeeper
>> "/opt/zookeeper/bin/z"   11 seconds ago      Up 10 seconds       2181/tcp,
>> 2888/tcp, 3888/tcp       docker-dip-zookeeper-3
>>
>> 65ffcbeda3f8        dallaybatta/gaas-cluster-worker   "/bin/sh -c
>> ./entry.s"   16 seconds ago      Up 16 seconds
>>             docker-gobblin-cluster-worker-2
>> 8702911890db        dallaybatta/gaas-cluster-master   "/bin/sh -c
>> ./entry.s"   22 seconds ago      Up 21 seconds
>>             docker-gobblin-cluster-master-2
>> 74e2f2040c68        dallaybatta/gaas-service          "/bin/sh -c
>> ./entry.s"   22 seconds ago      Up 21 seconds       0.0.0.0:2->9099/tcp
>>               docker-gobblin-service-2
>> 55ef0b4e7366        ches/kafka                        "/start.sh"
>>       22 seconds ago      Up 22 seconds       7203/tcp, 0.0.0.0:6502->9092/tcp
>>  docker-dip-kafka-2
>> 123ce726fc5a        jplock/zookeeper
>> "/opt/zookeeper/bin/z"   23 seconds ago      Up 22 seconds       2181/tcp,
>> 2888/tcp, 3888/tcp       docker-dip-zookeeper-2
>>
>> 1af34cfb5cb0        dallaybatta/gaas-cluster-worker   "/bin/sh -c
>> ./entry.s"   31 seconds ago      Up 30 seconds
>>             docker-gobblin-cluster-worker-1
>> bcf010f19338        dallaybatta/gaas-cluster-master   "/bin/sh -c
>> ./entry.s"   36 seconds ago      Up 35 seconds
>>             docker-gobblin-cluster-master-1
>> 1e10303e012a        dallaybatta/gaas-service          "/bin/sh -c
>> ./entry.s"   36 seconds ago      Up 36 seconds       0.0.0.0:1->9099/tcp
>>               docker-gobblin-service-1
>> b2a4f6ce6f86        ches/kafka                        "/start.sh"
>>       37 seconds ago      Up 36 seconds       7203/tcp, 0.0.0.0:6501->9092/tcp
>>  docker-dip-kafka-1
>> f6b92760c3a7        jplock/zookeeper
>> "/opt/zookeeper/bin/z"   37 seconds ago      Up 36 seconds       2181/tcp,
>> 2888/tcp, 3888/tcp       docker-dip-zookeeper-1
>>
>> ************************************************************
>> ************************************************************
>> ************************************************************
>> *******************
>>
>> Please note that it just uses the GAAS's rest endpoint and the Standalone
>> Cluster for scalibility
>>
>> Also we could deploy the artifacts ( tempaltes into the gaas-service
>> container and libraries in the cluster worker/master nodes) using the
>> docker cp command.
>>
>>
>> Thanks,
>> Vicky
>>
>>
>> On Tue, Apr 3, 2018 at 5:15 AM, Abhishek Tiwari <ab...@apache.org> wrote:
>>
>>> Hi Vicky,
>>>
>>> I had a follow-up with Sudarshan (cc'd), he will document his multi-hop
>>> design and thoughts on the open source wiki after a bit of clean up.
>>>
>>> Regards
>>> Abhishek
>>>
>>> On Mon, Mar 26, 2018 at 11:47 PM, Vicky Kak <vi...@gmail.com> wrote:
>>>
>>> > Hi Abhishek,
>>> >
>>> > I did not get a change to followup on this and hence a delayed
>>> response,
>>> > sorry about it. In our last meet there was a explanation about the
>>> Multi
>>> > Hop, would it be possible to describe it here.
>>> >
>>> > Thanks,
>>> > Vicky
>>> >
>>> > On Tue, Jan 23, 2018 at 5:14 AM, Abhishek Tiwari <ab...@apache.org>
>>> wrote:
>>> >
>>> >> Hi Vicky,
>>> >>
>>> >> Response inline but great suggestions and questions. Agree with each
>>> one,
>>> >> please feel free to create Jiras for each.
>>> >>
>>> >> Also, apologies for the late reply.
>>> >>
>>> >> Regards,
>>> >> Abhishek
>>> >>
>>> >> On Tue, Jan 9, 2018 at 3:24 AM, Vicky Kak <vi...@gmail.com>
>>> wrote:
>>> >>
>>> >> > Hi Guys,
>>> >> >
>>> >> > I have finally managed to install the GAAS with Standalone Cluster.
>>> >> >
>>> >> > Here are some of the observations to share
>>> >> >
>>> >> > 1) I have running the GAAS and Standalone cluster on the same
>>> machine
>>> >> and
>>> >> > from the same distribution, this will be typically needed for quick
>>> >> setup.
>>> >> > Since I have been starting the GAAS and Standalone master on same
>>> >> > distribution,
>>> >> > they both are directing the logs to the same master.out file
>>> leading to
>>> >> > overlap of the logging details from the GAAS and standalone master.
>>> I
>>> >> have
>>> >> > changed the logging file from master.out to clustermaster.out on my
>>> >> local
>>> >> > set up by changing the $GOBBLIN_HOME/bin/gobblin-cluster-master.sh
>>> as
>>> >> >
>>> >> >
>>> >> >    nohup $COMMAND >clustermaster.out 2>&1 & echo $! > $PID
>>> >> >
>>> >> >    We better make the changes in the distribution.
>>> >> >
>>> >> > I generally run two distributions at different locations to keep
>>> >> workspace
>>> >> / installation clean for each. But I see the advantage of using one
>>> >> (attaching debugger with single IDE instance, etc), so it would be a
>>> good
>>> >> idea to segregate the logging for both. We should create a Jira for
>>> this.
>>> >>
>>> >> >
>>> >> >
>>> >> > 2) The log4j logging configuration is dynamically controlled in the
>>> >> > standalone/worker implementation, it does not work by default.I
>>> looked
>>> >> at
>>> >> > how the log4j configurations are being controlled in other modes,
>>> it is
>>> >> > done via the bootstrap scripts e.g gobblin-aws.sh as
>>> >> >   LOG4J_PATH=file://${FWDIR_CONF}/log4j-aws.properties
>>> >> >   COMMAND="$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_FLAGS
>>> >> gobblin.aws.GobblinAWSClusterLauncher
>>> >> > -D log4j.configuration=$LOG4J_PATH"
>>> >> >
>>> >> > I see the log4j configurations similarly being configured in
>>> >> > gobblin-standalone.sh too
>>> >> >   COMMAND+="-Dlog4j.configuration=file://$FWDIR_CONF/log4j-
>>> >> standalone.xml
>>> >> > "
>>> >> >
>>> >> > I did made the similar changes for the gobblin-service.sh as
>>> >> >
>>> >> > LOG4J_PATH=file://${FWDIR_CONF}/log4j-cluster.properties
>>> >> >   COMMAND="$JAVA_HOME/bin/java -Dlog4j.debug
>>> >> -Dlog4j.configuration=$LOG4J_PATH
>>> >> > -cp $CLASSPATH $JVM_FLAGS gobblin.service.modules.core.G
>>> >> obblinServiceManager
>>> >> > --service_name $SERVICE_NAME $LOG_ARGS"
>>> >> >
>>> >> > This was done because the log4j configuration for the GAAS which
>>> should
>>> >> > have been taken from $GOBBLIN_HOME/conf/service/log
>>> >> 4j-cluster.properties
>>> >> > was not being taken from there, it was taken from the
>>> >> > $GOBBLIN_HOME/lib/generator-2.6.0.jar.
>>> >> >
>>> >> > We should keep the consistent model of loading the log4j, for the
>>> >> > standalone cluster the log4j configurations are being loaded via
>>> code
>>> >> and
>>> >> > for the other gobblin components(modes) it is via the configuration
>>> in
>>> >> the
>>> >> > bootstrap scripts. We should have it consistent and I think having
>>> it in
>>> >> > the bootstrap scripts via -Dlog4j.configuration is good option.
>>> >> >
>>> >> >  I have to copy the log4j-cluster.properties into the
>>> GOBBLIN_HOME/bin
>>> >> for
>>> >> > running the Standalone cluster master/worker node.
>>> >> > We need to fix these log4j configrations issues.
>>> >> >
>>> >> > Thanks, yes this should be made consistent.
>>> >>
>>> >> >
>>> >> > 3) The Gobblin service should have rest port configurable via
>>> properties
>>> >> > file, currently we get it from the property in the master.out log
>>> file.
>>> >> I
>>> >> > have to check how to get it using the d2 client as per the restli
>>> >> > framework.
>>> >> >
>>> >> Yes, this is pending. Internally, we run within a wrapper jetty
>>> container
>>> >> that has fixed port. So, this has slipped priority for anyone to
>>> address
>>> >> so
>>> >> far. Good reminder.
>>> >>
>>> >>
>>> >> > 4) We need to have SQL based TopologyStore, i.e Implement pluggable
>>> >> MySql
>>> >> > based TopologyStore.
>>> >> >
>>> >> +1
>>> >>
>>> >> >
>>> >> > 5) Capabilities are hardcoded into the configurations files. It
>>> would be
>>> >> > good to have the capabilities configured in the corresponding job
>>> pull
>>> >> file
>>> >> > and it should propagate to the GASS when required.
>>> >> >
>>> >> Yes, thats where we intend to move towards. We started with static
>>> >> configuration as v0, but should add a zk based registration or other
>>> >> dynamic ways to announce and discover capabilities. I believe
>>> Sudarshan is
>>> >> looking into multi-hop with a bit broader vision and might touch upon
>>> this
>>> >> too.
>>> >>
>>> >> >
>>> >> > 6) The Standalone master is not starting without configuring this
>>> >> property
>>> >> >
>>> >> > gobblin.cluster.jobconf.fullyQualifiedPath
>>> >> >
>>> >> > Here is the exception that I see when it is not configured
>>> >> >
>>> >> > 2018-01-09 13:25:11 IST DEBUG [main] org.apache.hadoop.security.Use
>>> >> rGroupInformation
>>> >> > - UGI loginUser:vicky (auth:SIMPLE)
>>> >> >
>>> >> > Exception in thread "main" java.lang.NullPointerException: at
>>> index 2
>>> >> >
>>> >> > at com.google.common.collect.ObjectArrays.checkElementNotNull(O
>>> >> bjectArrays.java:240)
>>> >> >
>>> >> >
>>> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>>> >> ObjectArrays.java:231)
>>> >> >
>>> >> >
>>> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>>> >> ObjectArrays.java:226)
>>> >> >
>>> >> >
>>> >> > at com.google.common.collect.ImmutableList.construct(ImmutableL
>>> >> ist.java:303)
>>> >> >
>>> >> >
>>> >> > at com.google.common.collect.ImmutableList.of(ImmutableList.jav
>>> a:107)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.create(
>>> >> > GobblinClusterManager.java:408)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.buildJobConfigurationM
>>> anager(
>>> >> > GobblinClusterManager.java:400)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.initializeAppLauncherA
>>> ndServic
>>> >> > es(GobblinClusterManager.java:198)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.<init>(
>>> >> > GobblinClusterManager.java:164)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.main(GobblinClusterMan
>>> >> ager.java:
>>> >> > 743)
>>> >> >
>>> >> >
>>> >> > Since the configuration looks for the job data from kafka queue,
>>> this
>>> >> > following configurations need not to be done.
>>> >> >
>>> >> > gobblin.cluster.jobconf.fullyQualifiedPath=/home/
>>> >> > vicky/development/gobblin/gobblin-dist-0.10.0/cluster-job-co
>>> nfig-bpu1
>>> >> >
>>> >> > I am going to look into this again, not sure if I am missing
>>> anything.
>>> >> >
>>> >> Seems redundant, and like a bug.
>>> >>
>>> >> >
>>> >> > Thanks,
>>> >> > Vicky
>>> >> >
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: GAAS feedback.

Posted by Vicky Kak <vi...@gmail.com>.
Hi Abhishek,

Yes I have plans to write the wiki about it. Please try it and let know if
you get it working with the wikipedia example.

Thanks,
Vicky


On Wed, Apr 25, 2018 at 4:08 PM, Abhishek Tiwari <ab...@apache.org> wrote:

> Hi Vicky,
>
> This is quite awesome work! I will try to get my hands wet with it over
> the weekend or so. Do you want to create a wiki page for this?
> Briefly looked at the commit diff and that looked fine too. I am looking
> forward to the PR.
>
> Sudarshan is working on updating his design doc to put it out for
> multi-hop.
>
> Abhishek
>
>
> On Tue, Apr 24, 2018 at 6:40 AM, Vicky Kak <vi...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> I have created multiple JIRA's based on the discussion we had in this
>> thread, these are
>> https://issues.apache.org/jira/browse/GOBBLIN-452
>> https://issues.apache.org/jira/browse/GOBBLIN-453
>>
>> The fix for these JIRA's are done in the patch here
>> https://github.com/dallaybatta/incubator-gobblin/commit/89a1
>> 6cc1f99497525eca2889f25c60cefed06d76
>>
>> I have created the binary distribution after making these changes here
>> https://github.com/dallaybatta/uploads/releases/download/v0.
>> 13.0/gobblin-distribution-0.13.0.tar.gz
>>
>> I also encountered the following issue while docerizing the GAAS (
>> https://github.com/dallaybatta/gaas-docker)
>> https://issues.apache.org/jira/browse/GOBBLIN-472
>>
>> The dockerization of GAAS does
>> 1) Create a docker containers for GAAS+Standard Cluster(single
>> master+single worker)
>> 2) It used the Zookeeper based statestore, I started with the hadoop but
>> had issues with it.
>> 3) The docker contains contains the wikipedia.template deployed.
>> 4) It has got a script that will create the gaas container group based on
>> unique id, this unique id  ( group id) can be associated with a unique user
>> and hence provide multi tenancy.
>>
>> Here is the sample of 3 GAAS container groups being created on the same
>> machine, demonstrating the multi tenancy
>> ************************************************************
>> ************************************************************
>> ************************************************************
>> *******************
>> CONTAINER ID        IMAGE                             COMMAND
>>       CREATED             STATUS              PORTS
>>       NAMES
>> c0be6bea1451        dallaybatta/gaas-cluster-worker   "/bin/sh -c
>> ./entry.s"   4 seconds ago       Up 3 seconds
>>              docker-gobblin-cluster-worker-3
>> c10644e81d5e        dallaybatta/gaas-cluster-master   "/bin/sh -c
>> ./entry.s"   10 seconds ago      Up 9 seconds
>>              docker-gobblin-cluster-master-3
>> 34fa6dd683c1        dallaybatta/gaas-service          "/bin/sh -c
>> ./entry.s"   10 seconds ago      Up 9 seconds        0.0.0.0:3->9099/tcp
>>               docker-gobblin-service-3
>> 66298f5b552e        ches/kafka                        "/start.sh"
>>       10 seconds ago      Up 9 seconds        7203/tcp, 0.0.0.0:6503->9092/tcp
>>  docker-dip-kafka-3
>> 4feb9200800a        jplock/zookeeper
>> "/opt/zookeeper/bin/z"   11 seconds ago      Up 10 seconds       2181/tcp,
>> 2888/tcp, 3888/tcp       docker-dip-zookeeper-3
>>
>> 65ffcbeda3f8        dallaybatta/gaas-cluster-worker   "/bin/sh -c
>> ./entry.s"   16 seconds ago      Up 16 seconds
>>             docker-gobblin-cluster-worker-2
>> 8702911890db        dallaybatta/gaas-cluster-master   "/bin/sh -c
>> ./entry.s"   22 seconds ago      Up 21 seconds
>>             docker-gobblin-cluster-master-2
>> 74e2f2040c68        dallaybatta/gaas-service          "/bin/sh -c
>> ./entry.s"   22 seconds ago      Up 21 seconds       0.0.0.0:2->9099/tcp
>>               docker-gobblin-service-2
>> 55ef0b4e7366        ches/kafka                        "/start.sh"
>>       22 seconds ago      Up 22 seconds       7203/tcp, 0.0.0.0:6502->9092/tcp
>>  docker-dip-kafka-2
>> 123ce726fc5a        jplock/zookeeper
>> "/opt/zookeeper/bin/z"   23 seconds ago      Up 22 seconds       2181/tcp,
>> 2888/tcp, 3888/tcp       docker-dip-zookeeper-2
>>
>> 1af34cfb5cb0        dallaybatta/gaas-cluster-worker   "/bin/sh -c
>> ./entry.s"   31 seconds ago      Up 30 seconds
>>             docker-gobblin-cluster-worker-1
>> bcf010f19338        dallaybatta/gaas-cluster-master   "/bin/sh -c
>> ./entry.s"   36 seconds ago      Up 35 seconds
>>             docker-gobblin-cluster-master-1
>> 1e10303e012a        dallaybatta/gaas-service          "/bin/sh -c
>> ./entry.s"   36 seconds ago      Up 36 seconds       0.0.0.0:1->9099/tcp
>>               docker-gobblin-service-1
>> b2a4f6ce6f86        ches/kafka                        "/start.sh"
>>       37 seconds ago      Up 36 seconds       7203/tcp, 0.0.0.0:6501->9092/tcp
>>  docker-dip-kafka-1
>> f6b92760c3a7        jplock/zookeeper
>> "/opt/zookeeper/bin/z"   37 seconds ago      Up 36 seconds       2181/tcp,
>> 2888/tcp, 3888/tcp       docker-dip-zookeeper-1
>>
>> ************************************************************
>> ************************************************************
>> ************************************************************
>> *******************
>>
>> Please note that it just uses the GAAS's rest endpoint and the Standalone
>> Cluster for scalibility
>>
>> Also we could deploy the artifacts ( tempaltes into the gaas-service
>> container and libraries in the cluster worker/master nodes) using the
>> docker cp command.
>>
>>
>> Thanks,
>> Vicky
>>
>>
>> On Tue, Apr 3, 2018 at 5:15 AM, Abhishek Tiwari <ab...@apache.org> wrote:
>>
>>> Hi Vicky,
>>>
>>> I had a follow-up with Sudarshan (cc'd), he will document his multi-hop
>>> design and thoughts on the open source wiki after a bit of clean up.
>>>
>>> Regards
>>> Abhishek
>>>
>>> On Mon, Mar 26, 2018 at 11:47 PM, Vicky Kak <vi...@gmail.com> wrote:
>>>
>>> > Hi Abhishek,
>>> >
>>> > I did not get a change to followup on this and hence a delayed
>>> response,
>>> > sorry about it. In our last meet there was a explanation about the
>>> Multi
>>> > Hop, would it be possible to describe it here.
>>> >
>>> > Thanks,
>>> > Vicky
>>> >
>>> > On Tue, Jan 23, 2018 at 5:14 AM, Abhishek Tiwari <ab...@apache.org>
>>> wrote:
>>> >
>>> >> Hi Vicky,
>>> >>
>>> >> Response inline but great suggestions and questions. Agree with each
>>> one,
>>> >> please feel free to create Jiras for each.
>>> >>
>>> >> Also, apologies for the late reply.
>>> >>
>>> >> Regards,
>>> >> Abhishek
>>> >>
>>> >> On Tue, Jan 9, 2018 at 3:24 AM, Vicky Kak <vi...@gmail.com>
>>> wrote:
>>> >>
>>> >> > Hi Guys,
>>> >> >
>>> >> > I have finally managed to install the GAAS with Standalone Cluster.
>>> >> >
>>> >> > Here are some of the observations to share
>>> >> >
>>> >> > 1) I have running the GAAS and Standalone cluster on the same
>>> machine
>>> >> and
>>> >> > from the same distribution, this will be typically needed for quick
>>> >> setup.
>>> >> > Since I have been starting the GAAS and Standalone master on same
>>> >> > distribution,
>>> >> > they both are directing the logs to the same master.out file
>>> leading to
>>> >> > overlap of the logging details from the GAAS and standalone master.
>>> I
>>> >> have
>>> >> > changed the logging file from master.out to clustermaster.out on my
>>> >> local
>>> >> > set up by changing the $GOBBLIN_HOME/bin/gobblin-cluster-master.sh
>>> as
>>> >> >
>>> >> >
>>> >> >    nohup $COMMAND >clustermaster.out 2>&1 & echo $! > $PID
>>> >> >
>>> >> >    We better make the changes in the distribution.
>>> >> >
>>> >> > I generally run two distributions at different locations to keep
>>> >> workspace
>>> >> / installation clean for each. But I see the advantage of using one
>>> >> (attaching debugger with single IDE instance, etc), so it would be a
>>> good
>>> >> idea to segregate the logging for both. We should create a Jira for
>>> this.
>>> >>
>>> >> >
>>> >> >
>>> >> > 2) The log4j logging configuration is dynamically controlled in the
>>> >> > standalone/worker implementation, it does not work by default.I
>>> looked
>>> >> at
>>> >> > how the log4j configurations are being controlled in other modes,
>>> it is
>>> >> > done via the bootstrap scripts e.g gobblin-aws.sh as
>>> >> >   LOG4J_PATH=file://${FWDIR_CONF}/log4j-aws.properties
>>> >> >   COMMAND="$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_FLAGS
>>> >> gobblin.aws.GobblinAWSClusterLauncher
>>> >> > -D log4j.configuration=$LOG4J_PATH"
>>> >> >
>>> >> > I see the log4j configurations similarly being configured in
>>> >> > gobblin-standalone.sh too
>>> >> >   COMMAND+="-Dlog4j.configuration=file://$FWDIR_CONF/log4j-
>>> >> standalone.xml
>>> >> > "
>>> >> >
>>> >> > I did made the similar changes for the gobblin-service.sh as
>>> >> >
>>> >> > LOG4J_PATH=file://${FWDIR_CONF}/log4j-cluster.properties
>>> >> >   COMMAND="$JAVA_HOME/bin/java -Dlog4j.debug
>>> >> -Dlog4j.configuration=$LOG4J_PATH
>>> >> > -cp $CLASSPATH $JVM_FLAGS gobblin.service.modules.core.G
>>> >> obblinServiceManager
>>> >> > --service_name $SERVICE_NAME $LOG_ARGS"
>>> >> >
>>> >> > This was done because the log4j configuration for the GAAS which
>>> should
>>> >> > have been taken from $GOBBLIN_HOME/conf/service/log
>>> >> 4j-cluster.properties
>>> >> > was not being taken from there, it was taken from the
>>> >> > $GOBBLIN_HOME/lib/generator-2.6.0.jar.
>>> >> >
>>> >> > We should keep the consistent model of loading the log4j, for the
>>> >> > standalone cluster the log4j configurations are being loaded via
>>> code
>>> >> and
>>> >> > for the other gobblin components(modes) it is via the configuration
>>> in
>>> >> the
>>> >> > bootstrap scripts. We should have it consistent and I think having
>>> it in
>>> >> > the bootstrap scripts via -Dlog4j.configuration is good option.
>>> >> >
>>> >> >  I have to copy the log4j-cluster.properties into the
>>> GOBBLIN_HOME/bin
>>> >> for
>>> >> > running the Standalone cluster master/worker node.
>>> >> > We need to fix these log4j configrations issues.
>>> >> >
>>> >> > Thanks, yes this should be made consistent.
>>> >>
>>> >> >
>>> >> > 3) The Gobblin service should have rest port configurable via
>>> properties
>>> >> > file, currently we get it from the property in the master.out log
>>> file.
>>> >> I
>>> >> > have to check how to get it using the d2 client as per the restli
>>> >> > framework.
>>> >> >
>>> >> Yes, this is pending. Internally, we run within a wrapper jetty
>>> container
>>> >> that has fixed port. So, this has slipped priority for anyone to
>>> address
>>> >> so
>>> >> far. Good reminder.
>>> >>
>>> >>
>>> >> > 4) We need to have SQL based TopologyStore, i.e Implement pluggable
>>> >> MySql
>>> >> > based TopologyStore.
>>> >> >
>>> >> +1
>>> >>
>>> >> >
>>> >> > 5) Capabilities are hardcoded into the configurations files. It
>>> would be
>>> >> > good to have the capabilities configured in the corresponding job
>>> pull
>>> >> file
>>> >> > and it should propagate to the GASS when required.
>>> >> >
>>> >> Yes, thats where we intend to move towards. We started with static
>>> >> configuration as v0, but should add a zk based registration or other
>>> >> dynamic ways to announce and discover capabilities. I believe
>>> Sudarshan is
>>> >> looking into multi-hop with a bit broader vision and might touch upon
>>> this
>>> >> too.
>>> >>
>>> >> >
>>> >> > 6) The Standalone master is not starting without configuring this
>>> >> property
>>> >> >
>>> >> > gobblin.cluster.jobconf.fullyQualifiedPath
>>> >> >
>>> >> > Here is the exception that I see when it is not configured
>>> >> >
>>> >> > 2018-01-09 13:25:11 IST DEBUG [main] org.apache.hadoop.security.Use
>>> >> rGroupInformation
>>> >> > - UGI loginUser:vicky (auth:SIMPLE)
>>> >> >
>>> >> > Exception in thread "main" java.lang.NullPointerException: at
>>> index 2
>>> >> >
>>> >> > at com.google.common.collect.ObjectArrays.checkElementNotNull(O
>>> >> bjectArrays.java:240)
>>> >> >
>>> >> >
>>> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>>> >> ObjectArrays.java:231)
>>> >> >
>>> >> >
>>> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>>> >> ObjectArrays.java:226)
>>> >> >
>>> >> >
>>> >> > at com.google.common.collect.ImmutableList.construct(ImmutableL
>>> >> ist.java:303)
>>> >> >
>>> >> >
>>> >> > at com.google.common.collect.ImmutableList.of(ImmutableList.jav
>>> a:107)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.create(
>>> >> > GobblinClusterManager.java:408)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.buildJobConfigurationM
>>> anager(
>>> >> > GobblinClusterManager.java:400)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.initializeAppLauncherA
>>> ndServic
>>> >> > es(GobblinClusterManager.java:198)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.<init>(
>>> >> > GobblinClusterManager.java:164)
>>> >> >
>>> >> > at gobblin.cluster.GobblinClusterManager.main(GobblinClusterMan
>>> >> ager.java:
>>> >> > 743)
>>> >> >
>>> >> >
>>> >> > Since the configuration looks for the job data from kafka queue,
>>> this
>>> >> > following configurations need not to be done.
>>> >> >
>>> >> > gobblin.cluster.jobconf.fullyQualifiedPath=/home/
>>> >> > vicky/development/gobblin/gobblin-dist-0.10.0/cluster-job-co
>>> nfig-bpu1
>>> >> >
>>> >> > I am going to look into this again, not sure if I am missing
>>> anything.
>>> >> >
>>> >> Seems redundant, and like a bug.
>>> >>
>>> >> >
>>> >> > Thanks,
>>> >> > Vicky
>>> >> >
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: GAAS feedback.

Posted by Abhishek Tiwari <ab...@apache.org>.
Hi Vicky,

This is quite awesome work! I will try to get my hands wet with it over the
weekend or so. Do you want to create a wiki page for this?
Briefly looked at the commit diff and that looked fine too. I am looking
forward to the PR.

Sudarshan is working on updating his design doc to put it out for
multi-hop.

Abhishek

On Tue, Apr 24, 2018 at 6:40 AM, Vicky Kak <vi...@gmail.com> wrote:

> Hi Guys,
>
> I have created multiple JIRA's based on the discussion we had in this
> thread, these are
> https://issues.apache.org/jira/browse/GOBBLIN-452
> https://issues.apache.org/jira/browse/GOBBLIN-453
>
> The fix for these JIRA's are done in the patch here
> https://github.com/dallaybatta/incubator-gobblin/commit/89a1
> 6cc1f99497525eca2889f25c60cefed06d76
>
> I have created the binary distribution after making these changes here
> https://github.com/dallaybatta/uploads/releases/download/v0.
> 13.0/gobblin-distribution-0.13.0.tar.gz
>
> I also encountered the following issue while docerizing the GAAS (
> https://github.com/dallaybatta/gaas-docker)
> https://issues.apache.org/jira/browse/GOBBLIN-472
>
> The dockerization of GAAS does
> 1) Create a docker containers for GAAS+Standard Cluster(single
> master+single worker)
> 2) It used the Zookeeper based statestore, I started with the hadoop but
> had issues with it.
> 3) The docker contains contains the wikipedia.template deployed.
> 4) It has got a script that will create the gaas container group based on
> unique id, this unique id  ( group id) can be associated with a unique user
> and hence provide multi tenancy.
>
> Here is the sample of 3 GAAS container groups being created on the same
> machine, demonstrating the multi tenancy
> ************************************************************
> ************************************************************
> ************************************************************
> *******************
> CONTAINER ID        IMAGE                             COMMAND
>     CREATED             STATUS              PORTS
>     NAMES
> c0be6bea1451        dallaybatta/gaas-cluster-worker   "/bin/sh -c
> ./entry.s"   4 seconds ago       Up 3 seconds
>              docker-gobblin-cluster-worker-3
> c10644e81d5e        dallaybatta/gaas-cluster-master   "/bin/sh -c
> ./entry.s"   10 seconds ago      Up 9 seconds
>              docker-gobblin-cluster-master-3
> 34fa6dd683c1        dallaybatta/gaas-service          "/bin/sh -c
> ./entry.s"   10 seconds ago      Up 9 seconds        0.0.0.0:3->9099/tcp
>               docker-gobblin-service-3
> 66298f5b552e        ches/kafka                        "/start.sh"
>     10 seconds ago      Up 9 seconds        7203/tcp, 0.0.0.0:6503->9092/tcp
>  docker-dip-kafka-3
> 4feb9200800a        jplock/zookeeper
> "/opt/zookeeper/bin/z"   11 seconds ago      Up 10 seconds       2181/tcp,
> 2888/tcp, 3888/tcp       docker-dip-zookeeper-3
>
> 65ffcbeda3f8        dallaybatta/gaas-cluster-worker   "/bin/sh -c
> ./entry.s"   16 seconds ago      Up 16 seconds
>             docker-gobblin-cluster-worker-2
> 8702911890db        dallaybatta/gaas-cluster-master   "/bin/sh -c
> ./entry.s"   22 seconds ago      Up 21 seconds
>             docker-gobblin-cluster-master-2
> 74e2f2040c68        dallaybatta/gaas-service          "/bin/sh -c
> ./entry.s"   22 seconds ago      Up 21 seconds       0.0.0.0:2->9099/tcp
>               docker-gobblin-service-2
> 55ef0b4e7366        ches/kafka                        "/start.sh"
>     22 seconds ago      Up 22 seconds       7203/tcp, 0.0.0.0:6502->9092/tcp
>  docker-dip-kafka-2
> 123ce726fc5a        jplock/zookeeper
> "/opt/zookeeper/bin/z"   23 seconds ago      Up 22 seconds       2181/tcp,
> 2888/tcp, 3888/tcp       docker-dip-zookeeper-2
>
> 1af34cfb5cb0        dallaybatta/gaas-cluster-worker   "/bin/sh -c
> ./entry.s"   31 seconds ago      Up 30 seconds
>             docker-gobblin-cluster-worker-1
> bcf010f19338        dallaybatta/gaas-cluster-master   "/bin/sh -c
> ./entry.s"   36 seconds ago      Up 35 seconds
>             docker-gobblin-cluster-master-1
> 1e10303e012a        dallaybatta/gaas-service          "/bin/sh -c
> ./entry.s"   36 seconds ago      Up 36 seconds       0.0.0.0:1->9099/tcp
>               docker-gobblin-service-1
> b2a4f6ce6f86        ches/kafka                        "/start.sh"
>     37 seconds ago      Up 36 seconds       7203/tcp, 0.0.0.0:6501->9092/tcp
>  docker-dip-kafka-1
> f6b92760c3a7        jplock/zookeeper
> "/opt/zookeeper/bin/z"   37 seconds ago      Up 36 seconds       2181/tcp,
> 2888/tcp, 3888/tcp       docker-dip-zookeeper-1
>
> ************************************************************
> ************************************************************
> ************************************************************
> *******************
>
> Please note that it just uses the GAAS's rest endpoint and the Standalone
> Cluster for scalibility
>
> Also we could deploy the artifacts ( tempaltes into the gaas-service
> container and libraries in the cluster worker/master nodes) using the
> docker cp command.
>
>
> Thanks,
> Vicky
>
>
> On Tue, Apr 3, 2018 at 5:15 AM, Abhishek Tiwari <ab...@apache.org> wrote:
>
>> Hi Vicky,
>>
>> I had a follow-up with Sudarshan (cc'd), he will document his multi-hop
>> design and thoughts on the open source wiki after a bit of clean up.
>>
>> Regards
>> Abhishek
>>
>> On Mon, Mar 26, 2018 at 11:47 PM, Vicky Kak <vi...@gmail.com> wrote:
>>
>> > Hi Abhishek,
>> >
>> > I did not get a change to followup on this and hence a delayed response,
>> > sorry about it. In our last meet there was a explanation about the Multi
>> > Hop, would it be possible to describe it here.
>> >
>> > Thanks,
>> > Vicky
>> >
>> > On Tue, Jan 23, 2018 at 5:14 AM, Abhishek Tiwari <ab...@apache.org>
>> wrote:
>> >
>> >> Hi Vicky,
>> >>
>> >> Response inline but great suggestions and questions. Agree with each
>> one,
>> >> please feel free to create Jiras for each.
>> >>
>> >> Also, apologies for the late reply.
>> >>
>> >> Regards,
>> >> Abhishek
>> >>
>> >> On Tue, Jan 9, 2018 at 3:24 AM, Vicky Kak <vi...@gmail.com> wrote:
>> >>
>> >> > Hi Guys,
>> >> >
>> >> > I have finally managed to install the GAAS with Standalone Cluster.
>> >> >
>> >> > Here are some of the observations to share
>> >> >
>> >> > 1) I have running the GAAS and Standalone cluster on the same machine
>> >> and
>> >> > from the same distribution, this will be typically needed for quick
>> >> setup.
>> >> > Since I have been starting the GAAS and Standalone master on same
>> >> > distribution,
>> >> > they both are directing the logs to the same master.out file leading
>> to
>> >> > overlap of the logging details from the GAAS and standalone master. I
>> >> have
>> >> > changed the logging file from master.out to clustermaster.out on my
>> >> local
>> >> > set up by changing the $GOBBLIN_HOME/bin/gobblin-cluster-master.sh
>> as
>> >> >
>> >> >
>> >> >    nohup $COMMAND >clustermaster.out 2>&1 & echo $! > $PID
>> >> >
>> >> >    We better make the changes in the distribution.
>> >> >
>> >> > I generally run two distributions at different locations to keep
>> >> workspace
>> >> / installation clean for each. But I see the advantage of using one
>> >> (attaching debugger with single IDE instance, etc), so it would be a
>> good
>> >> idea to segregate the logging for both. We should create a Jira for
>> this.
>> >>
>> >> >
>> >> >
>> >> > 2) The log4j logging configuration is dynamically controlled in the
>> >> > standalone/worker implementation, it does not work by default.I
>> looked
>> >> at
>> >> > how the log4j configurations are being controlled in other modes, it
>> is
>> >> > done via the bootstrap scripts e.g gobblin-aws.sh as
>> >> >   LOG4J_PATH=file://${FWDIR_CONF}/log4j-aws.properties
>> >> >   COMMAND="$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_FLAGS
>> >> gobblin.aws.GobblinAWSClusterLauncher
>> >> > -D log4j.configuration=$LOG4J_PATH"
>> >> >
>> >> > I see the log4j configurations similarly being configured in
>> >> > gobblin-standalone.sh too
>> >> >   COMMAND+="-Dlog4j.configuration=file://$FWDIR_CONF/log4j-
>> >> standalone.xml
>> >> > "
>> >> >
>> >> > I did made the similar changes for the gobblin-service.sh as
>> >> >
>> >> > LOG4J_PATH=file://${FWDIR_CONF}/log4j-cluster.properties
>> >> >   COMMAND="$JAVA_HOME/bin/java -Dlog4j.debug
>> >> -Dlog4j.configuration=$LOG4J_PATH
>> >> > -cp $CLASSPATH $JVM_FLAGS gobblin.service.modules.core.G
>> >> obblinServiceManager
>> >> > --service_name $SERVICE_NAME $LOG_ARGS"
>> >> >
>> >> > This was done because the log4j configuration for the GAAS which
>> should
>> >> > have been taken from $GOBBLIN_HOME/conf/service/log
>> >> 4j-cluster.properties
>> >> > was not being taken from there, it was taken from the
>> >> > $GOBBLIN_HOME/lib/generator-2.6.0.jar.
>> >> >
>> >> > We should keep the consistent model of loading the log4j, for the
>> >> > standalone cluster the log4j configurations are being loaded via code
>> >> and
>> >> > for the other gobblin components(modes) it is via the configuration
>> in
>> >> the
>> >> > bootstrap scripts. We should have it consistent and I think having
>> it in
>> >> > the bootstrap scripts via -Dlog4j.configuration is good option.
>> >> >
>> >> >  I have to copy the log4j-cluster.properties into the
>> GOBBLIN_HOME/bin
>> >> for
>> >> > running the Standalone cluster master/worker node.
>> >> > We need to fix these log4j configrations issues.
>> >> >
>> >> > Thanks, yes this should be made consistent.
>> >>
>> >> >
>> >> > 3) The Gobblin service should have rest port configurable via
>> properties
>> >> > file, currently we get it from the property in the master.out log
>> file.
>> >> I
>> >> > have to check how to get it using the d2 client as per the restli
>> >> > framework.
>> >> >
>> >> Yes, this is pending. Internally, we run within a wrapper jetty
>> container
>> >> that has fixed port. So, this has slipped priority for anyone to
>> address
>> >> so
>> >> far. Good reminder.
>> >>
>> >>
>> >> > 4) We need to have SQL based TopologyStore, i.e Implement pluggable
>> >> MySql
>> >> > based TopologyStore.
>> >> >
>> >> +1
>> >>
>> >> >
>> >> > 5) Capabilities are hardcoded into the configurations files. It
>> would be
>> >> > good to have the capabilities configured in the corresponding job
>> pull
>> >> file
>> >> > and it should propagate to the GASS when required.
>> >> >
>> >> Yes, thats where we intend to move towards. We started with static
>> >> configuration as v0, but should add a zk based registration or other
>> >> dynamic ways to announce and discover capabilities. I believe
>> Sudarshan is
>> >> looking into multi-hop with a bit broader vision and might touch upon
>> this
>> >> too.
>> >>
>> >> >
>> >> > 6) The Standalone master is not starting without configuring this
>> >> property
>> >> >
>> >> > gobblin.cluster.jobconf.fullyQualifiedPath
>> >> >
>> >> > Here is the exception that I see when it is not configured
>> >> >
>> >> > 2018-01-09 13:25:11 IST DEBUG [main] org.apache.hadoop.security.Use
>> >> rGroupInformation
>> >> > - UGI loginUser:vicky (auth:SIMPLE)
>> >> >
>> >> > Exception in thread "main" java.lang.NullPointerException: at index
>> 2
>> >> >
>> >> > at com.google.common.collect.ObjectArrays.checkElementNotNull(O
>> >> bjectArrays.java:240)
>> >> >
>> >> >
>> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>> >> ObjectArrays.java:231)
>> >> >
>> >> >
>> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>> >> ObjectArrays.java:226)
>> >> >
>> >> >
>> >> > at com.google.common.collect.ImmutableList.construct(ImmutableL
>> >> ist.java:303)
>> >> >
>> >> >
>> >> > at com.google.common.collect.ImmutableList.of(ImmutableList.jav
>> a:107)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.create(
>> >> > GobblinClusterManager.java:408)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.buildJobConfigurationM
>> anager(
>> >> > GobblinClusterManager.java:400)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.initializeAppLauncherA
>> ndServic
>> >> > es(GobblinClusterManager.java:198)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.<init>(
>> >> > GobblinClusterManager.java:164)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.main(GobblinClusterMan
>> >> ager.java:
>> >> > 743)
>> >> >
>> >> >
>> >> > Since the configuration looks for the job data from kafka queue, this
>> >> > following configurations need not to be done.
>> >> >
>> >> > gobblin.cluster.jobconf.fullyQualifiedPath=/home/
>> >> > vicky/development/gobblin/gobblin-dist-0.10.0/cluster-job-co
>> nfig-bpu1
>> >> >
>> >> > I am going to look into this again, not sure if I am missing
>> anything.
>> >> >
>> >> Seems redundant, and like a bug.
>> >>
>> >> >
>> >> > Thanks,
>> >> > Vicky
>> >> >
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: GAAS feedback.

Posted by Abhishek Tiwari <ab...@apache.org>.
Hi Vicky,

This is quite awesome work! I will try to get my hands wet with it over the
weekend or so. Do you want to create a wiki page for this?
Briefly looked at the commit diff and that looked fine too. I am looking
forward to the PR.

Sudarshan is working on updating his design doc to put it out for
multi-hop.

Abhishek

On Tue, Apr 24, 2018 at 6:40 AM, Vicky Kak <vi...@gmail.com> wrote:

> Hi Guys,
>
> I have created multiple JIRA's based on the discussion we had in this
> thread, these are
> https://issues.apache.org/jira/browse/GOBBLIN-452
> https://issues.apache.org/jira/browse/GOBBLIN-453
>
> The fix for these JIRA's are done in the patch here
> https://github.com/dallaybatta/incubator-gobblin/commit/89a1
> 6cc1f99497525eca2889f25c60cefed06d76
>
> I have created the binary distribution after making these changes here
> https://github.com/dallaybatta/uploads/releases/download/v0.
> 13.0/gobblin-distribution-0.13.0.tar.gz
>
> I also encountered the following issue while docerizing the GAAS (
> https://github.com/dallaybatta/gaas-docker)
> https://issues.apache.org/jira/browse/GOBBLIN-472
>
> The dockerization of GAAS does
> 1) Create a docker containers for GAAS+Standard Cluster(single
> master+single worker)
> 2) It used the Zookeeper based statestore, I started with the hadoop but
> had issues with it.
> 3) The docker contains contains the wikipedia.template deployed.
> 4) It has got a script that will create the gaas container group based on
> unique id, this unique id  ( group id) can be associated with a unique user
> and hence provide multi tenancy.
>
> Here is the sample of 3 GAAS container groups being created on the same
> machine, demonstrating the multi tenancy
> ************************************************************
> ************************************************************
> ************************************************************
> *******************
> CONTAINER ID        IMAGE                             COMMAND
>     CREATED             STATUS              PORTS
>     NAMES
> c0be6bea1451        dallaybatta/gaas-cluster-worker   "/bin/sh -c
> ./entry.s"   4 seconds ago       Up 3 seconds
>              docker-gobblin-cluster-worker-3
> c10644e81d5e        dallaybatta/gaas-cluster-master   "/bin/sh -c
> ./entry.s"   10 seconds ago      Up 9 seconds
>              docker-gobblin-cluster-master-3
> 34fa6dd683c1        dallaybatta/gaas-service          "/bin/sh -c
> ./entry.s"   10 seconds ago      Up 9 seconds        0.0.0.0:3->9099/tcp
>               docker-gobblin-service-3
> 66298f5b552e        ches/kafka                        "/start.sh"
>     10 seconds ago      Up 9 seconds        7203/tcp, 0.0.0.0:6503->9092/tcp
>  docker-dip-kafka-3
> 4feb9200800a        jplock/zookeeper
> "/opt/zookeeper/bin/z"   11 seconds ago      Up 10 seconds       2181/tcp,
> 2888/tcp, 3888/tcp       docker-dip-zookeeper-3
>
> 65ffcbeda3f8        dallaybatta/gaas-cluster-worker   "/bin/sh -c
> ./entry.s"   16 seconds ago      Up 16 seconds
>             docker-gobblin-cluster-worker-2
> 8702911890db        dallaybatta/gaas-cluster-master   "/bin/sh -c
> ./entry.s"   22 seconds ago      Up 21 seconds
>             docker-gobblin-cluster-master-2
> 74e2f2040c68        dallaybatta/gaas-service          "/bin/sh -c
> ./entry.s"   22 seconds ago      Up 21 seconds       0.0.0.0:2->9099/tcp
>               docker-gobblin-service-2
> 55ef0b4e7366        ches/kafka                        "/start.sh"
>     22 seconds ago      Up 22 seconds       7203/tcp, 0.0.0.0:6502->9092/tcp
>  docker-dip-kafka-2
> 123ce726fc5a        jplock/zookeeper
> "/opt/zookeeper/bin/z"   23 seconds ago      Up 22 seconds       2181/tcp,
> 2888/tcp, 3888/tcp       docker-dip-zookeeper-2
>
> 1af34cfb5cb0        dallaybatta/gaas-cluster-worker   "/bin/sh -c
> ./entry.s"   31 seconds ago      Up 30 seconds
>             docker-gobblin-cluster-worker-1
> bcf010f19338        dallaybatta/gaas-cluster-master   "/bin/sh -c
> ./entry.s"   36 seconds ago      Up 35 seconds
>             docker-gobblin-cluster-master-1
> 1e10303e012a        dallaybatta/gaas-service          "/bin/sh -c
> ./entry.s"   36 seconds ago      Up 36 seconds       0.0.0.0:1->9099/tcp
>               docker-gobblin-service-1
> b2a4f6ce6f86        ches/kafka                        "/start.sh"
>     37 seconds ago      Up 36 seconds       7203/tcp, 0.0.0.0:6501->9092/tcp
>  docker-dip-kafka-1
> f6b92760c3a7        jplock/zookeeper
> "/opt/zookeeper/bin/z"   37 seconds ago      Up 36 seconds       2181/tcp,
> 2888/tcp, 3888/tcp       docker-dip-zookeeper-1
>
> ************************************************************
> ************************************************************
> ************************************************************
> *******************
>
> Please note that it just uses the GAAS's rest endpoint and the Standalone
> Cluster for scalibility
>
> Also we could deploy the artifacts ( tempaltes into the gaas-service
> container and libraries in the cluster worker/master nodes) using the
> docker cp command.
>
>
> Thanks,
> Vicky
>
>
> On Tue, Apr 3, 2018 at 5:15 AM, Abhishek Tiwari <ab...@apache.org> wrote:
>
>> Hi Vicky,
>>
>> I had a follow-up with Sudarshan (cc'd), he will document his multi-hop
>> design and thoughts on the open source wiki after a bit of clean up.
>>
>> Regards
>> Abhishek
>>
>> On Mon, Mar 26, 2018 at 11:47 PM, Vicky Kak <vi...@gmail.com> wrote:
>>
>> > Hi Abhishek,
>> >
>> > I did not get a change to followup on this and hence a delayed response,
>> > sorry about it. In our last meet there was a explanation about the Multi
>> > Hop, would it be possible to describe it here.
>> >
>> > Thanks,
>> > Vicky
>> >
>> > On Tue, Jan 23, 2018 at 5:14 AM, Abhishek Tiwari <ab...@apache.org>
>> wrote:
>> >
>> >> Hi Vicky,
>> >>
>> >> Response inline but great suggestions and questions. Agree with each
>> one,
>> >> please feel free to create Jiras for each.
>> >>
>> >> Also, apologies for the late reply.
>> >>
>> >> Regards,
>> >> Abhishek
>> >>
>> >> On Tue, Jan 9, 2018 at 3:24 AM, Vicky Kak <vi...@gmail.com> wrote:
>> >>
>> >> > Hi Guys,
>> >> >
>> >> > I have finally managed to install the GAAS with Standalone Cluster.
>> >> >
>> >> > Here are some of the observations to share
>> >> >
>> >> > 1) I have running the GAAS and Standalone cluster on the same machine
>> >> and
>> >> > from the same distribution, this will be typically needed for quick
>> >> setup.
>> >> > Since I have been starting the GAAS and Standalone master on same
>> >> > distribution,
>> >> > they both are directing the logs to the same master.out file leading
>> to
>> >> > overlap of the logging details from the GAAS and standalone master. I
>> >> have
>> >> > changed the logging file from master.out to clustermaster.out on my
>> >> local
>> >> > set up by changing the $GOBBLIN_HOME/bin/gobblin-cluster-master.sh
>> as
>> >> >
>> >> >
>> >> >    nohup $COMMAND >clustermaster.out 2>&1 & echo $! > $PID
>> >> >
>> >> >    We better make the changes in the distribution.
>> >> >
>> >> > I generally run two distributions at different locations to keep
>> >> workspace
>> >> / installation clean for each. But I see the advantage of using one
>> >> (attaching debugger with single IDE instance, etc), so it would be a
>> good
>> >> idea to segregate the logging for both. We should create a Jira for
>> this.
>> >>
>> >> >
>> >> >
>> >> > 2) The log4j logging configuration is dynamically controlled in the
>> >> > standalone/worker implementation, it does not work by default.I
>> looked
>> >> at
>> >> > how the log4j configurations are being controlled in other modes, it
>> is
>> >> > done via the bootstrap scripts e.g gobblin-aws.sh as
>> >> >   LOG4J_PATH=file://${FWDIR_CONF}/log4j-aws.properties
>> >> >   COMMAND="$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_FLAGS
>> >> gobblin.aws.GobblinAWSClusterLauncher
>> >> > -D log4j.configuration=$LOG4J_PATH"
>> >> >
>> >> > I see the log4j configurations similarly being configured in
>> >> > gobblin-standalone.sh too
>> >> >   COMMAND+="-Dlog4j.configuration=file://$FWDIR_CONF/log4j-
>> >> standalone.xml
>> >> > "
>> >> >
>> >> > I did made the similar changes for the gobblin-service.sh as
>> >> >
>> >> > LOG4J_PATH=file://${FWDIR_CONF}/log4j-cluster.properties
>> >> >   COMMAND="$JAVA_HOME/bin/java -Dlog4j.debug
>> >> -Dlog4j.configuration=$LOG4J_PATH
>> >> > -cp $CLASSPATH $JVM_FLAGS gobblin.service.modules.core.G
>> >> obblinServiceManager
>> >> > --service_name $SERVICE_NAME $LOG_ARGS"
>> >> >
>> >> > This was done because the log4j configuration for the GAAS which
>> should
>> >> > have been taken from $GOBBLIN_HOME/conf/service/log
>> >> 4j-cluster.properties
>> >> > was not being taken from there, it was taken from the
>> >> > $GOBBLIN_HOME/lib/generator-2.6.0.jar.
>> >> >
>> >> > We should keep the consistent model of loading the log4j, for the
>> >> > standalone cluster the log4j configurations are being loaded via code
>> >> and
>> >> > for the other gobblin components(modes) it is via the configuration
>> in
>> >> the
>> >> > bootstrap scripts. We should have it consistent and I think having
>> it in
>> >> > the bootstrap scripts via -Dlog4j.configuration is good option.
>> >> >
>> >> >  I have to copy the log4j-cluster.properties into the
>> GOBBLIN_HOME/bin
>> >> for
>> >> > running the Standalone cluster master/worker node.
>> >> > We need to fix these log4j configrations issues.
>> >> >
>> >> > Thanks, yes this should be made consistent.
>> >>
>> >> >
>> >> > 3) The Gobblin service should have rest port configurable via
>> properties
>> >> > file, currently we get it from the property in the master.out log
>> file.
>> >> I
>> >> > have to check how to get it using the d2 client as per the restli
>> >> > framework.
>> >> >
>> >> Yes, this is pending. Internally, we run within a wrapper jetty
>> container
>> >> that has fixed port. So, this has slipped priority for anyone to
>> address
>> >> so
>> >> far. Good reminder.
>> >>
>> >>
>> >> > 4) We need to have SQL based TopologyStore, i.e Implement pluggable
>> >> MySql
>> >> > based TopologyStore.
>> >> >
>> >> +1
>> >>
>> >> >
>> >> > 5) Capabilities are hardcoded into the configurations files. It
>> would be
>> >> > good to have the capabilities configured in the corresponding job
>> pull
>> >> file
>> >> > and it should propagate to the GASS when required.
>> >> >
>> >> Yes, thats where we intend to move towards. We started with static
>> >> configuration as v0, but should add a zk based registration or other
>> >> dynamic ways to announce and discover capabilities. I believe
>> Sudarshan is
>> >> looking into multi-hop with a bit broader vision and might touch upon
>> this
>> >> too.
>> >>
>> >> >
>> >> > 6) The Standalone master is not starting without configuring this
>> >> property
>> >> >
>> >> > gobblin.cluster.jobconf.fullyQualifiedPath
>> >> >
>> >> > Here is the exception that I see when it is not configured
>> >> >
>> >> > 2018-01-09 13:25:11 IST DEBUG [main] org.apache.hadoop.security.Use
>> >> rGroupInformation
>> >> > - UGI loginUser:vicky (auth:SIMPLE)
>> >> >
>> >> > Exception in thread "main" java.lang.NullPointerException: at index
>> 2
>> >> >
>> >> > at com.google.common.collect.ObjectArrays.checkElementNotNull(O
>> >> bjectArrays.java:240)
>> >> >
>> >> >
>> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>> >> ObjectArrays.java:231)
>> >> >
>> >> >
>> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>> >> ObjectArrays.java:226)
>> >> >
>> >> >
>> >> > at com.google.common.collect.ImmutableList.construct(ImmutableL
>> >> ist.java:303)
>> >> >
>> >> >
>> >> > at com.google.common.collect.ImmutableList.of(ImmutableList.jav
>> a:107)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.create(
>> >> > GobblinClusterManager.java:408)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.buildJobConfigurationM
>> anager(
>> >> > GobblinClusterManager.java:400)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.initializeAppLauncherA
>> ndServic
>> >> > es(GobblinClusterManager.java:198)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.<init>(
>> >> > GobblinClusterManager.java:164)
>> >> >
>> >> > at gobblin.cluster.GobblinClusterManager.main(GobblinClusterMan
>> >> ager.java:
>> >> > 743)
>> >> >
>> >> >
>> >> > Since the configuration looks for the job data from kafka queue, this
>> >> > following configurations need not to be done.
>> >> >
>> >> > gobblin.cluster.jobconf.fullyQualifiedPath=/home/
>> >> > vicky/development/gobblin/gobblin-dist-0.10.0/cluster-job-co
>> nfig-bpu1
>> >> >
>> >> > I am going to look into this again, not sure if I am missing
>> anything.
>> >> >
>> >> Seems redundant, and like a bug.
>> >>
>> >> >
>> >> > Thanks,
>> >> > Vicky
>> >> >
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: GAAS feedback.

Posted by Vicky Kak <vi...@gmail.com>.
Hi Guys,

I have created multiple JIRA's based on the discussion we had in this
thread, these are
https://issues.apache.org/jira/browse/GOBBLIN-452
https://issues.apache.org/jira/browse/GOBBLIN-453

The fix for these JIRA's are done in the patch here
https://github.com/dallaybatta/incubator-gobblin/commit/89a16cc1f99497525eca2889f25c60cefed06d76

I have created the binary distribution after making these changes here
https://github.com/dallaybatta/uploads/releases/download/v0.13.0/gobblin-distribution-0.13.0.tar.gz

I also encountered the following issue while docerizing the GAAS (
https://github.com/dallaybatta/gaas-docker)
https://issues.apache.org/jira/browse/GOBBLIN-472

The dockerization of GAAS does
1) Create a docker containers for GAAS+Standard Cluster(single
master+single worker)
2) It used the Zookeeper based statestore, I started with the hadoop but
had issues with it.
3) The docker contains contains the wikipedia.template deployed.
4) It has got a script that will create the gaas container group based on
unique id, this unique id  ( group id) can be associated with a unique user
and hence provide multi tenancy.

Here is the sample of 3 GAAS container groups being created on the same
machine, demonstrating the multi tenancy
*******************************************************************************************************************************************************************************************************
CONTAINER ID        IMAGE                             COMMAND
    CREATED             STATUS              PORTS
    NAMES
c0be6bea1451        dallaybatta/gaas-cluster-worker   "/bin/sh -c
./entry.s"   4 seconds ago       Up 3 seconds
             docker-gobblin-cluster-worker-3
c10644e81d5e        dallaybatta/gaas-cluster-master   "/bin/sh -c
./entry.s"   10 seconds ago      Up 9 seconds
             docker-gobblin-cluster-master-3
34fa6dd683c1        dallaybatta/gaas-service          "/bin/sh -c
./entry.s"   10 seconds ago      Up 9 seconds        0.0.0.0:3->9099/tcp
            docker-gobblin-service-3
66298f5b552e        ches/kafka                        "/start.sh"
    10 seconds ago      Up 9 seconds        7203/tcp, 0.0.0.0:6503->9092/tcp
 docker-dip-kafka-3
4feb9200800a        jplock/zookeeper
"/opt/zookeeper/bin/z"   11 seconds ago      Up 10 seconds       2181/tcp,
2888/tcp, 3888/tcp       docker-dip-zookeeper-3

65ffcbeda3f8        dallaybatta/gaas-cluster-worker   "/bin/sh -c
./entry.s"   16 seconds ago      Up 16 seconds
            docker-gobblin-cluster-worker-2
8702911890db        dallaybatta/gaas-cluster-master   "/bin/sh -c
./entry.s"   22 seconds ago      Up 21 seconds
            docker-gobblin-cluster-master-2
74e2f2040c68        dallaybatta/gaas-service          "/bin/sh -c
./entry.s"   22 seconds ago      Up 21 seconds       0.0.0.0:2->9099/tcp
            docker-gobblin-service-2
55ef0b4e7366        ches/kafka                        "/start.sh"
    22 seconds ago      Up 22 seconds       7203/tcp, 0.0.0.0:6502->9092/tcp
 docker-dip-kafka-2
123ce726fc5a        jplock/zookeeper
"/opt/zookeeper/bin/z"   23 seconds ago      Up 22 seconds       2181/tcp,
2888/tcp, 3888/tcp       docker-dip-zookeeper-2

1af34cfb5cb0        dallaybatta/gaas-cluster-worker   "/bin/sh -c
./entry.s"   31 seconds ago      Up 30 seconds
            docker-gobblin-cluster-worker-1
bcf010f19338        dallaybatta/gaas-cluster-master   "/bin/sh -c
./entry.s"   36 seconds ago      Up 35 seconds
            docker-gobblin-cluster-master-1
1e10303e012a        dallaybatta/gaas-service          "/bin/sh -c
./entry.s"   36 seconds ago      Up 36 seconds       0.0.0.0:1->9099/tcp
            docker-gobblin-service-1
b2a4f6ce6f86        ches/kafka                        "/start.sh"
    37 seconds ago      Up 36 seconds       7203/tcp, 0.0.0.0:6501->9092/tcp
 docker-dip-kafka-1
f6b92760c3a7        jplock/zookeeper
"/opt/zookeeper/bin/z"   37 seconds ago      Up 36 seconds       2181/tcp,
2888/tcp, 3888/tcp       docker-dip-zookeeper-1

*******************************************************************************************************************************************************************************************************

Please note that it just uses the GAAS's rest endpoint and the Standalone
Cluster for scalibility

Also we could deploy the artifacts ( tempaltes into the gaas-service
container and libraries in the cluster worker/master nodes) using the
docker cp command.


Thanks,
Vicky


On Tue, Apr 3, 2018 at 5:15 AM, Abhishek Tiwari <ab...@apache.org> wrote:

> Hi Vicky,
>
> I had a follow-up with Sudarshan (cc'd), he will document his multi-hop
> design and thoughts on the open source wiki after a bit of clean up.
>
> Regards
> Abhishek
>
> On Mon, Mar 26, 2018 at 11:47 PM, Vicky Kak <vi...@gmail.com> wrote:
>
> > Hi Abhishek,
> >
> > I did not get a change to followup on this and hence a delayed response,
> > sorry about it. In our last meet there was a explanation about the Multi
> > Hop, would it be possible to describe it here.
> >
> > Thanks,
> > Vicky
> >
> > On Tue, Jan 23, 2018 at 5:14 AM, Abhishek Tiwari <ab...@apache.org>
> wrote:
> >
> >> Hi Vicky,
> >>
> >> Response inline but great suggestions and questions. Agree with each
> one,
> >> please feel free to create Jiras for each.
> >>
> >> Also, apologies for the late reply.
> >>
> >> Regards,
> >> Abhishek
> >>
> >> On Tue, Jan 9, 2018 at 3:24 AM, Vicky Kak <vi...@gmail.com> wrote:
> >>
> >> > Hi Guys,
> >> >
> >> > I have finally managed to install the GAAS with Standalone Cluster.
> >> >
> >> > Here are some of the observations to share
> >> >
> >> > 1) I have running the GAAS and Standalone cluster on the same machine
> >> and
> >> > from the same distribution, this will be typically needed for quick
> >> setup.
> >> > Since I have been starting the GAAS and Standalone master on same
> >> > distribution,
> >> > they both are directing the logs to the same master.out file leading
> to
> >> > overlap of the logging details from the GAAS and standalone master. I
> >> have
> >> > changed the logging file from master.out to clustermaster.out on my
> >> local
> >> > set up by changing the $GOBBLIN_HOME/bin/gobblin-cluster-master.sh as
> >> >
> >> >
> >> >    nohup $COMMAND >clustermaster.out 2>&1 & echo $! > $PID
> >> >
> >> >    We better make the changes in the distribution.
> >> >
> >> > I generally run two distributions at different locations to keep
> >> workspace
> >> / installation clean for each. But I see the advantage of using one
> >> (attaching debugger with single IDE instance, etc), so it would be a
> good
> >> idea to segregate the logging for both. We should create a Jira for
> this.
> >>
> >> >
> >> >
> >> > 2) The log4j logging configuration is dynamically controlled in the
> >> > standalone/worker implementation, it does not work by default.I looked
> >> at
> >> > how the log4j configurations are being controlled in other modes, it
> is
> >> > done via the bootstrap scripts e.g gobblin-aws.sh as
> >> >   LOG4J_PATH=file://${FWDIR_CONF}/log4j-aws.properties
> >> >   COMMAND="$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_FLAGS
> >> gobblin.aws.GobblinAWSClusterLauncher
> >> > -D log4j.configuration=$LOG4J_PATH"
> >> >
> >> > I see the log4j configurations similarly being configured in
> >> > gobblin-standalone.sh too
> >> >   COMMAND+="-Dlog4j.configuration=file://$FWDIR_CONF/log4j-
> >> standalone.xml
> >> > "
> >> >
> >> > I did made the similar changes for the gobblin-service.sh as
> >> >
> >> > LOG4J_PATH=file://${FWDIR_CONF}/log4j-cluster.properties
> >> >   COMMAND="$JAVA_HOME/bin/java -Dlog4j.debug
> >> -Dlog4j.configuration=$LOG4J_PATH
> >> > -cp $CLASSPATH $JVM_FLAGS gobblin.service.modules.core.G
> >> obblinServiceManager
> >> > --service_name $SERVICE_NAME $LOG_ARGS"
> >> >
> >> > This was done because the log4j configuration for the GAAS which
> should
> >> > have been taken from $GOBBLIN_HOME/conf/service/log
> >> 4j-cluster.properties
> >> > was not being taken from there, it was taken from the
> >> > $GOBBLIN_HOME/lib/generator-2.6.0.jar.
> >> >
> >> > We should keep the consistent model of loading the log4j, for the
> >> > standalone cluster the log4j configurations are being loaded via code
> >> and
> >> > for the other gobblin components(modes) it is via the configuration in
> >> the
> >> > bootstrap scripts. We should have it consistent and I think having it
> in
> >> > the bootstrap scripts via -Dlog4j.configuration is good option.
> >> >
> >> >  I have to copy the log4j-cluster.properties into the GOBBLIN_HOME/bin
> >> for
> >> > running the Standalone cluster master/worker node.
> >> > We need to fix these log4j configrations issues.
> >> >
> >> > Thanks, yes this should be made consistent.
> >>
> >> >
> >> > 3) The Gobblin service should have rest port configurable via
> properties
> >> > file, currently we get it from the property in the master.out log
> file.
> >> I
> >> > have to check how to get it using the d2 client as per the restli
> >> > framework.
> >> >
> >> Yes, this is pending. Internally, we run within a wrapper jetty
> container
> >> that has fixed port. So, this has slipped priority for anyone to address
> >> so
> >> far. Good reminder.
> >>
> >>
> >> > 4) We need to have SQL based TopologyStore, i.e Implement pluggable
> >> MySql
> >> > based TopologyStore.
> >> >
> >> +1
> >>
> >> >
> >> > 5) Capabilities are hardcoded into the configurations files. It would
> be
> >> > good to have the capabilities configured in the corresponding job pull
> >> file
> >> > and it should propagate to the GASS when required.
> >> >
> >> Yes, thats where we intend to move towards. We started with static
> >> configuration as v0, but should add a zk based registration or other
> >> dynamic ways to announce and discover capabilities. I believe Sudarshan
> is
> >> looking into multi-hop with a bit broader vision and might touch upon
> this
> >> too.
> >>
> >> >
> >> > 6) The Standalone master is not starting without configuring this
> >> property
> >> >
> >> > gobblin.cluster.jobconf.fullyQualifiedPath
> >> >
> >> > Here is the exception that I see when it is not configured
> >> >
> >> > 2018-01-09 13:25:11 IST DEBUG [main] org.apache.hadoop.security.Use
> >> rGroupInformation
> >> > - UGI loginUser:vicky (auth:SIMPLE)
> >> >
> >> > Exception in thread "main" java.lang.NullPointerException: at index 2
> >> >
> >> > at com.google.common.collect.ObjectArrays.checkElementNotNull(O
> >> bjectArrays.java:240)
> >> >
> >> >
> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
> >> ObjectArrays.java:231)
> >> >
> >> >
> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
> >> ObjectArrays.java:226)
> >> >
> >> >
> >> > at com.google.common.collect.ImmutableList.construct(ImmutableL
> >> ist.java:303)
> >> >
> >> >
> >> > at com.google.common.collect.ImmutableList.of(ImmutableList.java:107)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.create(
> >> > GobblinClusterManager.java:408)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.
> buildJobConfigurationManager(
> >> > GobblinClusterManager.java:400)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.
> initializeAppLauncherAndServic
> >> > es(GobblinClusterManager.java:198)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.<init>(
> >> > GobblinClusterManager.java:164)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.main(GobblinClusterMan
> >> ager.java:
> >> > 743)
> >> >
> >> >
> >> > Since the configuration looks for the job data from kafka queue, this
> >> > following configurations need not to be done.
> >> >
> >> > gobblin.cluster.jobconf.fullyQualifiedPath=/home/
> >> > vicky/development/gobblin/gobblin-dist-0.10.0/cluster-job-config-bpu1
> >> >
> >> > I am going to look into this again, not sure if I am missing anything.
> >> >
> >> Seems redundant, and like a bug.
> >>
> >> >
> >> > Thanks,
> >> > Vicky
> >> >
> >> >
> >> >
> >>
> >
> >
>

Re: GAAS feedback.

Posted by Vicky Kak <vi...@gmail.com>.
Hi Guys,

I have created multiple JIRA's based on the discussion we had in this
thread, these are
https://issues.apache.org/jira/browse/GOBBLIN-452
https://issues.apache.org/jira/browse/GOBBLIN-453

The fix for these JIRA's are done in the patch here
https://github.com/dallaybatta/incubator-gobblin/commit/89a16cc1f99497525eca2889f25c60cefed06d76

I have created the binary distribution after making these changes here
https://github.com/dallaybatta/uploads/releases/download/v0.13.0/gobblin-distribution-0.13.0.tar.gz

I also encountered the following issue while docerizing the GAAS (
https://github.com/dallaybatta/gaas-docker)
https://issues.apache.org/jira/browse/GOBBLIN-472

The dockerization of GAAS does
1) Create a docker containers for GAAS+Standard Cluster(single
master+single worker)
2) It used the Zookeeper based statestore, I started with the hadoop but
had issues with it.
3) The docker contains contains the wikipedia.template deployed.
4) It has got a script that will create the gaas container group based on
unique id, this unique id  ( group id) can be associated with a unique user
and hence provide multi tenancy.

Here is the sample of 3 GAAS container groups being created on the same
machine, demonstrating the multi tenancy
*******************************************************************************************************************************************************************************************************
CONTAINER ID        IMAGE                             COMMAND
    CREATED             STATUS              PORTS
    NAMES
c0be6bea1451        dallaybatta/gaas-cluster-worker   "/bin/sh -c
./entry.s"   4 seconds ago       Up 3 seconds
             docker-gobblin-cluster-worker-3
c10644e81d5e        dallaybatta/gaas-cluster-master   "/bin/sh -c
./entry.s"   10 seconds ago      Up 9 seconds
             docker-gobblin-cluster-master-3
34fa6dd683c1        dallaybatta/gaas-service          "/bin/sh -c
./entry.s"   10 seconds ago      Up 9 seconds        0.0.0.0:3->9099/tcp
            docker-gobblin-service-3
66298f5b552e        ches/kafka                        "/start.sh"
    10 seconds ago      Up 9 seconds        7203/tcp, 0.0.0.0:6503->9092/tcp
 docker-dip-kafka-3
4feb9200800a        jplock/zookeeper
"/opt/zookeeper/bin/z"   11 seconds ago      Up 10 seconds       2181/tcp,
2888/tcp, 3888/tcp       docker-dip-zookeeper-3

65ffcbeda3f8        dallaybatta/gaas-cluster-worker   "/bin/sh -c
./entry.s"   16 seconds ago      Up 16 seconds
            docker-gobblin-cluster-worker-2
8702911890db        dallaybatta/gaas-cluster-master   "/bin/sh -c
./entry.s"   22 seconds ago      Up 21 seconds
            docker-gobblin-cluster-master-2
74e2f2040c68        dallaybatta/gaas-service          "/bin/sh -c
./entry.s"   22 seconds ago      Up 21 seconds       0.0.0.0:2->9099/tcp
            docker-gobblin-service-2
55ef0b4e7366        ches/kafka                        "/start.sh"
    22 seconds ago      Up 22 seconds       7203/tcp, 0.0.0.0:6502->9092/tcp
 docker-dip-kafka-2
123ce726fc5a        jplock/zookeeper
"/opt/zookeeper/bin/z"   23 seconds ago      Up 22 seconds       2181/tcp,
2888/tcp, 3888/tcp       docker-dip-zookeeper-2

1af34cfb5cb0        dallaybatta/gaas-cluster-worker   "/bin/sh -c
./entry.s"   31 seconds ago      Up 30 seconds
            docker-gobblin-cluster-worker-1
bcf010f19338        dallaybatta/gaas-cluster-master   "/bin/sh -c
./entry.s"   36 seconds ago      Up 35 seconds
            docker-gobblin-cluster-master-1
1e10303e012a        dallaybatta/gaas-service          "/bin/sh -c
./entry.s"   36 seconds ago      Up 36 seconds       0.0.0.0:1->9099/tcp
            docker-gobblin-service-1
b2a4f6ce6f86        ches/kafka                        "/start.sh"
    37 seconds ago      Up 36 seconds       7203/tcp, 0.0.0.0:6501->9092/tcp
 docker-dip-kafka-1
f6b92760c3a7        jplock/zookeeper
"/opt/zookeeper/bin/z"   37 seconds ago      Up 36 seconds       2181/tcp,
2888/tcp, 3888/tcp       docker-dip-zookeeper-1

*******************************************************************************************************************************************************************************************************

Please note that it just uses the GAAS's rest endpoint and the Standalone
Cluster for scalibility

Also we could deploy the artifacts ( tempaltes into the gaas-service
container and libraries in the cluster worker/master nodes) using the
docker cp command.


Thanks,
Vicky


On Tue, Apr 3, 2018 at 5:15 AM, Abhishek Tiwari <ab...@apache.org> wrote:

> Hi Vicky,
>
> I had a follow-up with Sudarshan (cc'd), he will document his multi-hop
> design and thoughts on the open source wiki after a bit of clean up.
>
> Regards
> Abhishek
>
> On Mon, Mar 26, 2018 at 11:47 PM, Vicky Kak <vi...@gmail.com> wrote:
>
> > Hi Abhishek,
> >
> > I did not get a change to followup on this and hence a delayed response,
> > sorry about it. In our last meet there was a explanation about the Multi
> > Hop, would it be possible to describe it here.
> >
> > Thanks,
> > Vicky
> >
> > On Tue, Jan 23, 2018 at 5:14 AM, Abhishek Tiwari <ab...@apache.org>
> wrote:
> >
> >> Hi Vicky,
> >>
> >> Response inline but great suggestions and questions. Agree with each
> one,
> >> please feel free to create Jiras for each.
> >>
> >> Also, apologies for the late reply.
> >>
> >> Regards,
> >> Abhishek
> >>
> >> On Tue, Jan 9, 2018 at 3:24 AM, Vicky Kak <vi...@gmail.com> wrote:
> >>
> >> > Hi Guys,
> >> >
> >> > I have finally managed to install the GAAS with Standalone Cluster.
> >> >
> >> > Here are some of the observations to share
> >> >
> >> > 1) I have running the GAAS and Standalone cluster on the same machine
> >> and
> >> > from the same distribution, this will be typically needed for quick
> >> setup.
> >> > Since I have been starting the GAAS and Standalone master on same
> >> > distribution,
> >> > they both are directing the logs to the same master.out file leading
> to
> >> > overlap of the logging details from the GAAS and standalone master. I
> >> have
> >> > changed the logging file from master.out to clustermaster.out on my
> >> local
> >> > set up by changing the $GOBBLIN_HOME/bin/gobblin-cluster-master.sh as
> >> >
> >> >
> >> >    nohup $COMMAND >clustermaster.out 2>&1 & echo $! > $PID
> >> >
> >> >    We better make the changes in the distribution.
> >> >
> >> > I generally run two distributions at different locations to keep
> >> workspace
> >> / installation clean for each. But I see the advantage of using one
> >> (attaching debugger with single IDE instance, etc), so it would be a
> good
> >> idea to segregate the logging for both. We should create a Jira for
> this.
> >>
> >> >
> >> >
> >> > 2) The log4j logging configuration is dynamically controlled in the
> >> > standalone/worker implementation, it does not work by default.I looked
> >> at
> >> > how the log4j configurations are being controlled in other modes, it
> is
> >> > done via the bootstrap scripts e.g gobblin-aws.sh as
> >> >   LOG4J_PATH=file://${FWDIR_CONF}/log4j-aws.properties
> >> >   COMMAND="$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_FLAGS
> >> gobblin.aws.GobblinAWSClusterLauncher
> >> > -D log4j.configuration=$LOG4J_PATH"
> >> >
> >> > I see the log4j configurations similarly being configured in
> >> > gobblin-standalone.sh too
> >> >   COMMAND+="-Dlog4j.configuration=file://$FWDIR_CONF/log4j-
> >> standalone.xml
> >> > "
> >> >
> >> > I did made the similar changes for the gobblin-service.sh as
> >> >
> >> > LOG4J_PATH=file://${FWDIR_CONF}/log4j-cluster.properties
> >> >   COMMAND="$JAVA_HOME/bin/java -Dlog4j.debug
> >> -Dlog4j.configuration=$LOG4J_PATH
> >> > -cp $CLASSPATH $JVM_FLAGS gobblin.service.modules.core.G
> >> obblinServiceManager
> >> > --service_name $SERVICE_NAME $LOG_ARGS"
> >> >
> >> > This was done because the log4j configuration for the GAAS which
> should
> >> > have been taken from $GOBBLIN_HOME/conf/service/log
> >> 4j-cluster.properties
> >> > was not being taken from there, it was taken from the
> >> > $GOBBLIN_HOME/lib/generator-2.6.0.jar.
> >> >
> >> > We should keep the consistent model of loading the log4j, for the
> >> > standalone cluster the log4j configurations are being loaded via code
> >> and
> >> > for the other gobblin components(modes) it is via the configuration in
> >> the
> >> > bootstrap scripts. We should have it consistent and I think having it
> in
> >> > the bootstrap scripts via -Dlog4j.configuration is good option.
> >> >
> >> >  I have to copy the log4j-cluster.properties into the GOBBLIN_HOME/bin
> >> for
> >> > running the Standalone cluster master/worker node.
> >> > We need to fix these log4j configrations issues.
> >> >
> >> > Thanks, yes this should be made consistent.
> >>
> >> >
> >> > 3) The Gobblin service should have rest port configurable via
> properties
> >> > file, currently we get it from the property in the master.out log
> file.
> >> I
> >> > have to check how to get it using the d2 client as per the restli
> >> > framework.
> >> >
> >> Yes, this is pending. Internally, we run within a wrapper jetty
> container
> >> that has fixed port. So, this has slipped priority for anyone to address
> >> so
> >> far. Good reminder.
> >>
> >>
> >> > 4) We need to have SQL based TopologyStore, i.e Implement pluggable
> >> MySql
> >> > based TopologyStore.
> >> >
> >> +1
> >>
> >> >
> >> > 5) Capabilities are hardcoded into the configurations files. It would
> be
> >> > good to have the capabilities configured in the corresponding job pull
> >> file
> >> > and it should propagate to the GASS when required.
> >> >
> >> Yes, thats where we intend to move towards. We started with static
> >> configuration as v0, but should add a zk based registration or other
> >> dynamic ways to announce and discover capabilities. I believe Sudarshan
> is
> >> looking into multi-hop with a bit broader vision and might touch upon
> this
> >> too.
> >>
> >> >
> >> > 6) The Standalone master is not starting without configuring this
> >> property
> >> >
> >> > gobblin.cluster.jobconf.fullyQualifiedPath
> >> >
> >> > Here is the exception that I see when it is not configured
> >> >
> >> > 2018-01-09 13:25:11 IST DEBUG [main] org.apache.hadoop.security.Use
> >> rGroupInformation
> >> > - UGI loginUser:vicky (auth:SIMPLE)
> >> >
> >> > Exception in thread "main" java.lang.NullPointerException: at index 2
> >> >
> >> > at com.google.common.collect.ObjectArrays.checkElementNotNull(O
> >> bjectArrays.java:240)
> >> >
> >> >
> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
> >> ObjectArrays.java:231)
> >> >
> >> >
> >> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
> >> ObjectArrays.java:226)
> >> >
> >> >
> >> > at com.google.common.collect.ImmutableList.construct(ImmutableL
> >> ist.java:303)
> >> >
> >> >
> >> > at com.google.common.collect.ImmutableList.of(ImmutableList.java:107)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.create(
> >> > GobblinClusterManager.java:408)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.
> buildJobConfigurationManager(
> >> > GobblinClusterManager.java:400)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.
> initializeAppLauncherAndServic
> >> > es(GobblinClusterManager.java:198)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.<init>(
> >> > GobblinClusterManager.java:164)
> >> >
> >> > at gobblin.cluster.GobblinClusterManager.main(GobblinClusterMan
> >> ager.java:
> >> > 743)
> >> >
> >> >
> >> > Since the configuration looks for the job data from kafka queue, this
> >> > following configurations need not to be done.
> >> >
> >> > gobblin.cluster.jobconf.fullyQualifiedPath=/home/
> >> > vicky/development/gobblin/gobblin-dist-0.10.0/cluster-job-config-bpu1
> >> >
> >> > I am going to look into this again, not sure if I am missing anything.
> >> >
> >> Seems redundant, and like a bug.
> >>
> >> >
> >> > Thanks,
> >> > Vicky
> >> >
> >> >
> >> >
> >>
> >
> >
>