You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Jungtaek Lim <ka...@gmail.com> on 2016/08/02 01:55:08 UTC

[PROPOSAL] submitting topology with adding jars and maven artifacts

Hi dev,

This is proposal review thread for submitting topology with adding jars and
maven artifacts. This is also following up discussion thread for [DISCUSSION]
Policy of resolving dependencies for non storm-core modules.[1]

I've written design doc which also describes motivation on this.
https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission

Please review this and comment to "this thread" instead of wiki page so
that all devs can be notified for the update.

Thanks,
Jungtaek Lim (HeartSaVioR)

[1]
http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Jungtaek Lim <ka...@gmail.com>.
Hi devs,

Pull request for this proposal is now available.
https://github.com/apache/storm/pull/1608

It's only against 1.x-branch, and I need to update the doc. I'll take care
of them accordingly.

Please take a look at pull request and comment.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 8월 4일 (목) 오전 12:20, Jungtaek Lim <ka...@gmail.com>님이 작성:

> One thing I've found while working is that we may want to add package with
> excludes.
>
> Launching child class with --packages to kafka_2.10 just fails since it
> has conflicted libraries as transitive dependencies. Not sure how to
> represent that, but technically Aether seems to support this.
>
> SLF4J: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the
> class path, preempting StackOverflowError.
> SLF4J: See also http://www.slf4j.org/codes.html#log4jDelegationLoop for
> more details.
>
> Two jars are transitive dependencies of org.apache.kafka:kafka_2.10:0.9.0
> (also 0.8.2.1). So if we would like to add kafka lib from submission step,
> exclusion should be supported.
>
> 2016년 8월 4일 (목) 오전 12:03, Jungtaek Lim <ka...@gmail.com>님이 작성:
>
>> FYI: This proposal is filed to STORM-2016
>> <https://issues.apache.org/jira/browse/STORM-2016> and I've been working
>> on this.
>>
>> I'd like to explain the details on topology submitter as I wasn't clear
>> on that.
>>
>> I've been experimenting several ways of topology submission, but they're
>> all having pros and cons.
>>
>> 1. Introduce Submitter class which resolves dependencies and upload them
>> to blobstore, and load topology code and dependencies to custom mutable
>> classloader and finally run child class' main method by reflection. This is
>> what SparkSubmit is doing though that is more complicated due to support
>> various options.
>>
>> pros.
>> - No need to handle communication between processes. That class
>> bootstraps and handle all of things.
>> cons.
>> - We should pass custom classloader to all usages of Class.forName in
>> order to prevent any CNFs.
>> - Spark uses checkstyle to check usage of Class.forName, but we don't
>> apply that so we could miss it.
>>
>> 2. Introduce Helper class which resolves transitive dependencies (with
>> fetching) and upload them to blobstore, and return pair of (blob key, file)
>> map. storm.py reads the response of Helper class and add them to classpath
>> and run child class' main.
>>
>> pros.
>> - We don't need to use Classloader hack (?).
>> - If we make Helper class to separate module, we can even place that
>> module to outside of lib and avoid adding aether libraries to lib directory.
>> cons.
>> - It's annoying and error prone to get and parse Helper's output from
>> stdout.
>> - Also storm.py needs to run two classes but it's not a big deal since we
>> already do that. (confvalue, and ClientJarTransformerRunner)
>> - It's not easy to remove dependencies from blobstore if topology
>> submission from child class is failed.
>>
>> 3 Let Helper class just resolves transitive dependencies and return file
>> list. storm.py reads the response of Helper class and add them to
>> classpath and run child class' main. StormSubmitter will upload them to
>> blobstore.
>>
>> pros.
>> - Same as 2.
>> - Easy to remove dependencies from blobstore if submission is failed.
>> - Helper class is no longer depending on storm-core. Easier to place the
>> module to outside of lib.
>> cons.
>> - StormSubmitter should handle dependencies when submitting topology.
>>
>> I've succeed with 2, and will try 3 to see it helps.
>>
>> Any other suggestions or opinions for existing options are much
>> appreciated!
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2016년 8월 3일 (수) 오전 8:01, Jungtaek Lim <ka...@gmail.com>님이 작성:
>>
>>> Hi Priyank,
>>>
>>> first of all, this feature is similar (close) to what Spark provides.
>>>
>>> https://spark.apache.org/docs/2.0.0/submitting-applications.html#advanced-dependency-management
>>>
>>> if you have additional jars which are not packed to uber topology jar,
>>> you can use --jars option to include them without repackaging topology jar.
>>>
>>> And I think I was not clear on submitter. I'm still trying to design
>>> that point in detail since resolving dependencies need eclipse aether
>>> libraries so thinking about avoiding to add dependency to storm-core. But
>>> it seems not that easy and clear. I'll update once I'm clear on this.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> 2016년 8월 3일 (수) 오전 7:43, Priyank Shah <ps...@hortonworks.com>님이 작성:
>>>
>>>> Hi Jungtaek,
>>>>
>>>> For adding jars and maven at submission, you have used the word
>>>> Submitter. Is Submitter the person running storm jar command or is
>>>> Submitter the java code that actually submits it to Nimbus?
>>>> Also, I did not quite understand the --jars option. If you could please
>>>> elaborate a little on that, that will be great
>>>>
>>>> Thanks
>>>> Priyank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 8/2/16, 7:05 AM, "Jungtaek Lim" <ka...@gmail.com> wrote:
>>>>
>>>> >Ah, Satish you got the point. I meant copied version of files in
>>>> >supervisor, but itself can be isolated.
>>>> >I didn't think about removing blobs, and it seems not easy to do.
>>>> >
>>>> >Jungtaek Lim (HeartSaVIoR)
>>>> >
>>>> >
>>>> >2016년 8월 2일 (화) 오후 7:35, Satish Duggana <sa...@gmail.com>님이
>>>> 작성:
>>>> >
>>>> >> Hi Jungtaek,
>>>> >> With the current proposal, are we removing blob store files referred
>>>> by a
>>>> >> topology when it is killed?
>>>> >>
>>>> >> Thanks,
>>>> >> Satish.
>>>> >>
>>>> >> On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <ka...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> > Hi Satish,
>>>> >> >
>>>> >> > Thanks for reviewing and share your idea.
>>>> >> >
>>>> >> > Yes this is shared dependencies vs isolated dependencies.
>>>> >> > If we name file of dependency to contain group name, artifact
>>>> name, and
>>>> >> > version, that can be shared.
>>>> >> > One downside of this approach is storage space since we don't know
>>>> when
>>>> >> > it's safe to delete without additional care, but I'm curious that
>>>> disk
>>>> >> > fills up due to dependency blob jar files in normal situation.
>>>> >> > So I think we're OK to do this but I would like to see others
>>>> opinions.
>>>> >> >
>>>> >> > Btw, I'm designing details based on proposal. Will update to this
>>>> thread
>>>> >> if
>>>> >> > there're not covered things with initial design.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Jungtaek Lim (HeartSaVioR)
>>>> >> >
>>>> >> > 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <sa...@gmail.com>님이
>>>> 작성:
>>>> >> >
>>>> >> > > Hi Jungtaek,
>>>> >> > > Proposal looks good to me. Good that we are not going with other
>>>> >> > > alternative using mutable classloader etc.
>>>> >> > >
>>>> >> > > Good to have the mentioned config in proposal to add those jars
>>>> before
>>>> >> or
>>>> >> > > after storm core/libs. There is a property Config.
>>>> >> > > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as
>>>> initial
>>>> >> > > classpath and that should continue to be working as expected
>>>> even with
>>>> >> > the
>>>> >> > > new configuration.
>>>> >> > >
>>>> >> > > One enhancement which we may want to add to the existing
>>>> proposal.
>>>> >> > > When --packages are used, storm submitter can upload those
>>>> dependencies
>>>> >> > in
>>>> >> > > blob store with a defined naming convention so that same set of
>>>> >> packages
>>>> >> > > are not uploaded again and they can be used again for other
>>>> topologies
>>>> >> if
>>>> >> > > they use same package.
>>>> >> > >
>>>> >> > > Thanks,
>>>> >> > > Satish.
>>>> >> > >
>>>> >> > >
>>>> >> > > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com>
>>>> >> wrote:
>>>> >> > >
>>>> >> > > > Hi dev,
>>>> >> > > >
>>>> >> > > > This is proposal review thread for submitting topology with
>>>> adding
>>>> >> jars
>>>> >> > > and
>>>> >> > > > maven artifacts. This is also following up discussion thread
>>>> for
>>>> >> > > > [DISCUSSION]
>>>> >> > > > Policy of resolving dependencies for non storm-core modules.[1]
>>>> >> > > >
>>>> >> > > > I've written design doc which also describes motivation on
>>>> this.
>>>> >> > > >
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
>>>> >> > > >
>>>> >> > > > Please review this and comment to "this thread" instead of
>>>> wiki page
>>>> >> so
>>>> >> > > > that all devs can be notified for the update.
>>>> >> > > >
>>>> >> > > > Thanks,
>>>> >> > > > Jungtaek Lim (HeartSaVioR)
>>>> >> > > >
>>>> >> > > > [1]
>>>> >> > > >
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>>
>>>

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Jungtaek Lim <ka...@gmail.com>.
One thing I've found while working is that we may want to add package with
excludes.

Launching child class with --packages to kafka_2.10 just fails since it has
conflicted libraries as transitive dependencies. Not sure how to represent
that, but technically Aether seems to support this.

SLF4J: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the
class path, preempting StackOverflowError.
SLF4J: See also http://www.slf4j.org/codes.html#log4jDelegationLoop for
more details.

Two jars are transitive dependencies of org.apache.kafka:kafka_2.10:0.9.0
(also 0.8.2.1). So if we would like to add kafka lib from submission step,
exclusion should be supported.

2016년 8월 4일 (목) 오전 12:03, Jungtaek Lim <ka...@gmail.com>님이 작성:

> FYI: This proposal is filed to STORM-2016
> <https://issues.apache.org/jira/browse/STORM-2016> and I've been working
> on this.
>
> I'd like to explain the details on topology submitter as I wasn't clear on
> that.
>
> I've been experimenting several ways of topology submission, but they're
> all having pros and cons.
>
> 1. Introduce Submitter class which resolves dependencies and upload them
> to blobstore, and load topology code and dependencies to custom mutable
> classloader and finally run child class' main method by reflection. This is
> what SparkSubmit is doing though that is more complicated due to support
> various options.
>
> pros.
> - No need to handle communication between processes. That class bootstraps
> and handle all of things.
> cons.
> - We should pass custom classloader to all usages of Class.forName in
> order to prevent any CNFs.
> - Spark uses checkstyle to check usage of Class.forName, but we don't
> apply that so we could miss it.
>
> 2. Introduce Helper class which resolves transitive dependencies (with
> fetching) and upload them to blobstore, and return pair of (blob key, file)
> map. storm.py reads the response of Helper class and add them to classpath
> and run child class' main.
>
> pros.
> - We don't need to use Classloader hack (?).
> - If we make Helper class to separate module, we can even place that
> module to outside of lib and avoid adding aether libraries to lib directory.
> cons.
> - It's annoying and error prone to get and parse Helper's output from
> stdout.
> - Also storm.py needs to run two classes but it's not a big deal since we
> already do that. (confvalue, and ClientJarTransformerRunner)
> - It's not easy to remove dependencies from blobstore if topology
> submission from child class is failed.
>
> 3 Let Helper class just resolves transitive dependencies and return file
> list. storm.py reads the response of Helper class and add them to
> classpath and run child class' main. StormSubmitter will upload them to
> blobstore.
>
> pros.
> - Same as 2.
> - Easy to remove dependencies from blobstore if submission is failed.
> - Helper class is no longer depending on storm-core. Easier to place the
> module to outside of lib.
> cons.
> - StormSubmitter should handle dependencies when submitting topology.
>
> I've succeed with 2, and will try 3 to see it helps.
>
> Any other suggestions or opinions for existing options are much
> appreciated!
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 8월 3일 (수) 오전 8:01, Jungtaek Lim <ka...@gmail.com>님이 작성:
>
>> Hi Priyank,
>>
>> first of all, this feature is similar (close) to what Spark provides.
>>
>> https://spark.apache.org/docs/2.0.0/submitting-applications.html#advanced-dependency-management
>>
>> if you have additional jars which are not packed to uber topology jar,
>> you can use --jars option to include them without repackaging topology jar.
>>
>> And I think I was not clear on submitter. I'm still trying to design that
>> point in detail since resolving dependencies need eclipse aether libraries
>> so thinking about avoiding to add dependency to storm-core. But it seems
>> not that easy and clear. I'll update once I'm clear on this.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2016년 8월 3일 (수) 오전 7:43, Priyank Shah <ps...@hortonworks.com>님이 작성:
>>
>>> Hi Jungtaek,
>>>
>>> For adding jars and maven at submission, you have used the word
>>> Submitter. Is Submitter the person running storm jar command or is
>>> Submitter the java code that actually submits it to Nimbus?
>>> Also, I did not quite understand the --jars option. If you could please
>>> elaborate a little on that, that will be great
>>>
>>> Thanks
>>> Priyank
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 8/2/16, 7:05 AM, "Jungtaek Lim" <ka...@gmail.com> wrote:
>>>
>>> >Ah, Satish you got the point. I meant copied version of files in
>>> >supervisor, but itself can be isolated.
>>> >I didn't think about removing blobs, and it seems not easy to do.
>>> >
>>> >Jungtaek Lim (HeartSaVIoR)
>>> >
>>> >
>>> >2016년 8월 2일 (화) 오후 7:35, Satish Duggana <sa...@gmail.com>님이
>>> 작성:
>>> >
>>> >> Hi Jungtaek,
>>> >> With the current proposal, are we removing blob store files referred
>>> by a
>>> >> topology when it is killed?
>>> >>
>>> >> Thanks,
>>> >> Satish.
>>> >>
>>> >> On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <ka...@gmail.com>
>>> wrote:
>>> >>
>>> >> > Hi Satish,
>>> >> >
>>> >> > Thanks for reviewing and share your idea.
>>> >> >
>>> >> > Yes this is shared dependencies vs isolated dependencies.
>>> >> > If we name file of dependency to contain group name, artifact name,
>>> and
>>> >> > version, that can be shared.
>>> >> > One downside of this approach is storage space since we don't know
>>> when
>>> >> > it's safe to delete without additional care, but I'm curious that
>>> disk
>>> >> > fills up due to dependency blob jar files in normal situation.
>>> >> > So I think we're OK to do this but I would like to see others
>>> opinions.
>>> >> >
>>> >> > Btw, I'm designing details based on proposal. Will update to this
>>> thread
>>> >> if
>>> >> > there're not covered things with initial design.
>>> >> >
>>> >> > Thanks,
>>> >> > Jungtaek Lim (HeartSaVioR)
>>> >> >
>>> >> > 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <sa...@gmail.com>님이
>>> 작성:
>>> >> >
>>> >> > > Hi Jungtaek,
>>> >> > > Proposal looks good to me. Good that we are not going with other
>>> >> > > alternative using mutable classloader etc.
>>> >> > >
>>> >> > > Good to have the mentioned config in proposal to add those jars
>>> before
>>> >> or
>>> >> > > after storm core/libs. There is a property Config.
>>> >> > > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as
>>> initial
>>> >> > > classpath and that should continue to be working as expected even
>>> with
>>> >> > the
>>> >> > > new configuration.
>>> >> > >
>>> >> > > One enhancement which we may want to add to the existing proposal.
>>> >> > > When --packages are used, storm submitter can upload those
>>> dependencies
>>> >> > in
>>> >> > > blob store with a defined naming convention so that same set of
>>> >> packages
>>> >> > > are not uploaded again and they can be used again for other
>>> topologies
>>> >> if
>>> >> > > they use same package.
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Satish.
>>> >> > >
>>> >> > >
>>> >> > > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com>
>>> >> wrote:
>>> >> > >
>>> >> > > > Hi dev,
>>> >> > > >
>>> >> > > > This is proposal review thread for submitting topology with
>>> adding
>>> >> jars
>>> >> > > and
>>> >> > > > maven artifacts. This is also following up discussion thread for
>>> >> > > > [DISCUSSION]
>>> >> > > > Policy of resolving dependencies for non storm-core modules.[1]
>>> >> > > >
>>> >> > > > I've written design doc which also describes motivation on this.
>>> >> > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
>>> >> > > >
>>> >> > > > Please review this and comment to "this thread" instead of wiki
>>> page
>>> >> so
>>> >> > > > that all devs can be notified for the update.
>>> >> > > >
>>> >> > > > Thanks,
>>> >> > > > Jungtaek Lim (HeartSaVioR)
>>> >> > > >
>>> >> > > > [1]
>>> >> > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>>
>>

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Jungtaek Lim <ka...@gmail.com>.
FYI: This proposal is filed to STORM-2016
<https://issues.apache.org/jira/browse/STORM-2016> and I've been working on
this.

I'd like to explain the details on topology submitter as I wasn't clear on
that.

I've been experimenting several ways of topology submission, but they're
all having pros and cons.

1. Introduce Submitter class which resolves dependencies and upload them to
blobstore, and load topology code and dependencies to custom mutable
classloader and finally run child class' main method by reflection. This is
what SparkSubmit is doing though that is more complicated due to support
various options.

pros.
- No need to handle communication between processes. That class bootstraps
and handle all of things.
cons.
- We should pass custom classloader to all usages of Class.forName in order
to prevent any CNFs.
- Spark uses checkstyle to check usage of Class.forName, but we don't apply
that so we could miss it.

2. Introduce Helper class which resolves transitive dependencies (with
fetching) and upload them to blobstore, and return pair of (blob key, file)
map. storm.py reads the response of Helper class and add them to classpath
and run child class' main.

pros.
- We don't need to use Classloader hack (?).
- If we make Helper class to separate module, we can even place that module
to outside of lib and avoid adding aether libraries to lib directory.
cons.
- It's annoying and error prone to get and parse Helper's output from
stdout.
- Also storm.py needs to run two classes but it's not a big deal since we
already do that. (confvalue, and ClientJarTransformerRunner)
- It's not easy to remove dependencies from blobstore if topology
submission from child class is failed.

3 Let Helper class just resolves transitive dependencies and return file
list. storm.py reads the response of Helper class and add them to classpath
and run child class' main. StormSubmitter will upload them to blobstore.

pros.
- Same as 2.
- Easy to remove dependencies from blobstore if submission is failed.
- Helper class is no longer depending on storm-core. Easier to place the
module to outside of lib.
cons.
- StormSubmitter should handle dependencies when submitting topology.

I've succeed with 2, and will try 3 to see it helps.

Any other suggestions or opinions for existing options are much appreciated!

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 8월 3일 (수) 오전 8:01, Jungtaek Lim <ka...@gmail.com>님이 작성:

> Hi Priyank,
>
> first of all, this feature is similar (close) to what Spark provides.
>
> https://spark.apache.org/docs/2.0.0/submitting-applications.html#advanced-dependency-management
>
> if you have additional jars which are not packed to uber topology jar, you
> can use --jars option to include them without repackaging topology jar.
>
> And I think I was not clear on submitter. I'm still trying to design that
> point in detail since resolving dependencies need eclipse aether libraries
> so thinking about avoiding to add dependency to storm-core. But it seems
> not that easy and clear. I'll update once I'm clear on this.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 8월 3일 (수) 오전 7:43, Priyank Shah <ps...@hortonworks.com>님이 작성:
>
>> Hi Jungtaek,
>>
>> For adding jars and maven at submission, you have used the word
>> Submitter. Is Submitter the person running storm jar command or is
>> Submitter the java code that actually submits it to Nimbus?
>> Also, I did not quite understand the --jars option. If you could please
>> elaborate a little on that, that will be great
>>
>> Thanks
>> Priyank
>>
>>
>>
>>
>>
>>
>> On 8/2/16, 7:05 AM, "Jungtaek Lim" <ka...@gmail.com> wrote:
>>
>> >Ah, Satish you got the point. I meant copied version of files in
>> >supervisor, but itself can be isolated.
>> >I didn't think about removing blobs, and it seems not easy to do.
>> >
>> >Jungtaek Lim (HeartSaVIoR)
>> >
>> >
>> >2016년 8월 2일 (화) 오후 7:35, Satish Duggana <sa...@gmail.com>님이 작성:
>> >
>> >> Hi Jungtaek,
>> >> With the current proposal, are we removing blob store files referred
>> by a
>> >> topology when it is killed?
>> >>
>> >> Thanks,
>> >> Satish.
>> >>
>> >> On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <ka...@gmail.com>
>> wrote:
>> >>
>> >> > Hi Satish,
>> >> >
>> >> > Thanks for reviewing and share your idea.
>> >> >
>> >> > Yes this is shared dependencies vs isolated dependencies.
>> >> > If we name file of dependency to contain group name, artifact name,
>> and
>> >> > version, that can be shared.
>> >> > One downside of this approach is storage space since we don't know
>> when
>> >> > it's safe to delete without additional care, but I'm curious that
>> disk
>> >> > fills up due to dependency blob jar files in normal situation.
>> >> > So I think we're OK to do this but I would like to see others
>> opinions.
>> >> >
>> >> > Btw, I'm designing details based on proposal. Will update to this
>> thread
>> >> if
>> >> > there're not covered things with initial design.
>> >> >
>> >> > Thanks,
>> >> > Jungtaek Lim (HeartSaVioR)
>> >> >
>> >> > 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <sa...@gmail.com>님이
>> 작성:
>> >> >
>> >> > > Hi Jungtaek,
>> >> > > Proposal looks good to me. Good that we are not going with other
>> >> > > alternative using mutable classloader etc.
>> >> > >
>> >> > > Good to have the mentioned config in proposal to add those jars
>> before
>> >> or
>> >> > > after storm core/libs. There is a property Config.
>> >> > > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as initial
>> >> > > classpath and that should continue to be working as expected even
>> with
>> >> > the
>> >> > > new configuration.
>> >> > >
>> >> > > One enhancement which we may want to add to the existing proposal.
>> >> > > When --packages are used, storm submitter can upload those
>> dependencies
>> >> > in
>> >> > > blob store with a defined naming convention so that same set of
>> >> packages
>> >> > > are not uploaded again and they can be used again for other
>> topologies
>> >> if
>> >> > > they use same package.
>> >> > >
>> >> > > Thanks,
>> >> > > Satish.
>> >> > >
>> >> > >
>> >> > > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com>
>> >> wrote:
>> >> > >
>> >> > > > Hi dev,
>> >> > > >
>> >> > > > This is proposal review thread for submitting topology with
>> adding
>> >> jars
>> >> > > and
>> >> > > > maven artifacts. This is also following up discussion thread for
>> >> > > > [DISCUSSION]
>> >> > > > Policy of resolving dependencies for non storm-core modules.[1]
>> >> > > >
>> >> > > > I've written design doc which also describes motivation on this.
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
>> >> > > >
>> >> > > > Please review this and comment to "this thread" instead of wiki
>> page
>> >> so
>> >> > > > that all devs can be notified for the update.
>> >> > > >
>> >> > > > Thanks,
>> >> > > > Jungtaek Lim (HeartSaVioR)
>> >> > > >
>> >> > > > [1]
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
>> >> > > >
>> >> > >
>> >> >
>> >>
>>
>

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Jungtaek Lim <ka...@gmail.com>.
Hi Priyank,

first of all, this feature is similar (close) to what Spark provides.
https://spark.apache.org/docs/2.0.0/submitting-applications.html#advanced-dependency-management

if you have additional jars which are not packed to uber topology jar, you
can use --jars option to include them without repackaging topology jar.

And I think I was not clear on submitter. I'm still trying to design that
point in detail since resolving dependencies need eclipse aether libraries
so thinking about avoiding to add dependency to storm-core. But it seems
not that easy and clear. I'll update once I'm clear on this.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 8월 3일 (수) 오전 7:43, Priyank Shah <ps...@hortonworks.com>님이 작성:

> Hi Jungtaek,
>
> For adding jars and maven at submission, you have used the word Submitter.
> Is Submitter the person running storm jar command or is Submitter the java
> code that actually submits it to Nimbus?
> Also, I did not quite understand the --jars option. If you could please
> elaborate a little on that, that will be great
>
> Thanks
> Priyank
>
>
>
>
>
>
> On 8/2/16, 7:05 AM, "Jungtaek Lim" <ka...@gmail.com> wrote:
>
> >Ah, Satish you got the point. I meant copied version of files in
> >supervisor, but itself can be isolated.
> >I didn't think about removing blobs, and it seems not easy to do.
> >
> >Jungtaek Lim (HeartSaVIoR)
> >
> >
> >2016년 8월 2일 (화) 오후 7:35, Satish Duggana <sa...@gmail.com>님이 작성:
> >
> >> Hi Jungtaek,
> >> With the current proposal, are we removing blob store files referred by
> a
> >> topology when it is killed?
> >>
> >> Thanks,
> >> Satish.
> >>
> >> On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> >>
> >> > Hi Satish,
> >> >
> >> > Thanks for reviewing and share your idea.
> >> >
> >> > Yes this is shared dependencies vs isolated dependencies.
> >> > If we name file of dependency to contain group name, artifact name,
> and
> >> > version, that can be shared.
> >> > One downside of this approach is storage space since we don't know
> when
> >> > it's safe to delete without additional care, but I'm curious that disk
> >> > fills up due to dependency blob jar files in normal situation.
> >> > So I think we're OK to do this but I would like to see others
> opinions.
> >> >
> >> > Btw, I'm designing details based on proposal. Will update to this
> thread
> >> if
> >> > there're not covered things with initial design.
> >> >
> >> > Thanks,
> >> > Jungtaek Lim (HeartSaVioR)
> >> >
> >> > 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <sa...@gmail.com>님이
> 작성:
> >> >
> >> > > Hi Jungtaek,
> >> > > Proposal looks good to me. Good that we are not going with other
> >> > > alternative using mutable classloader etc.
> >> > >
> >> > > Good to have the mentioned config in proposal to add those jars
> before
> >> or
> >> > > after storm core/libs. There is a property Config.
> >> > > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as initial
> >> > > classpath and that should continue to be working as expected even
> with
> >> > the
> >> > > new configuration.
> >> > >
> >> > > One enhancement which we may want to add to the existing proposal.
> >> > > When --packages are used, storm submitter can upload those
> dependencies
> >> > in
> >> > > blob store with a defined naming convention so that same set of
> >> packages
> >> > > are not uploaded again and they can be used again for other
> topologies
> >> if
> >> > > they use same package.
> >> > >
> >> > > Thanks,
> >> > > Satish.
> >> > >
> >> > >
> >> > > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi dev,
> >> > > >
> >> > > > This is proposal review thread for submitting topology with adding
> >> jars
> >> > > and
> >> > > > maven artifacts. This is also following up discussion thread for
> >> > > > [DISCUSSION]
> >> > > > Policy of resolving dependencies for non storm-core modules.[1]
> >> > > >
> >> > > > I've written design doc which also describes motivation on this.
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
> >> > > >
> >> > > > Please review this and comment to "this thread" instead of wiki
> page
> >> so
> >> > > > that all devs can be notified for the update.
> >> > > >
> >> > > > Thanks,
> >> > > > Jungtaek Lim (HeartSaVioR)
> >> > > >
> >> > > > [1]
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
> >> > > >
> >> > >
> >> >
> >>
>

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Priyank Shah <ps...@hortonworks.com>.
Hi Jungtaek,

For adding jars and maven at submission, you have used the word Submitter. Is Submitter the person running storm jar command or is Submitter the java code that actually submits it to Nimbus?
Also, I did not quite understand the --jars option. If you could please elaborate a little on that, that will be great

Thanks
Priyank






On 8/2/16, 7:05 AM, "Jungtaek Lim" <ka...@gmail.com> wrote:

>Ah, Satish you got the point. I meant copied version of files in
>supervisor, but itself can be isolated.
>I didn't think about removing blobs, and it seems not easy to do.
>
>Jungtaek Lim (HeartSaVIoR)
>
>
>2016년 8월 2일 (화) 오후 7:35, Satish Duggana <sa...@gmail.com>님이 작성:
>
>> Hi Jungtaek,
>> With the current proposal, are we removing blob store files referred by a
>> topology when it is killed?
>>
>> Thanks,
>> Satish.
>>
>> On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>>
>> > Hi Satish,
>> >
>> > Thanks for reviewing and share your idea.
>> >
>> > Yes this is shared dependencies vs isolated dependencies.
>> > If we name file of dependency to contain group name, artifact name, and
>> > version, that can be shared.
>> > One downside of this approach is storage space since we don't know when
>> > it's safe to delete without additional care, but I'm curious that disk
>> > fills up due to dependency blob jar files in normal situation.
>> > So I think we're OK to do this but I would like to see others opinions.
>> >
>> > Btw, I'm designing details based on proposal. Will update to this thread
>> if
>> > there're not covered things with initial design.
>> >
>> > Thanks,
>> > Jungtaek Lim (HeartSaVioR)
>> >
>> > 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <sa...@gmail.com>님이 작성:
>> >
>> > > Hi Jungtaek,
>> > > Proposal looks good to me. Good that we are not going with other
>> > > alternative using mutable classloader etc.
>> > >
>> > > Good to have the mentioned config in proposal to add those jars before
>> or
>> > > after storm core/libs. There is a property Config.
>> > > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as initial
>> > > classpath and that should continue to be working as expected even with
>> > the
>> > > new configuration.
>> > >
>> > > One enhancement which we may want to add to the existing proposal.
>> > > When --packages are used, storm submitter can upload those dependencies
>> > in
>> > > blob store with a defined naming convention so that same set of
>> packages
>> > > are not uploaded again and they can be used again for other topologies
>> if
>> > > they use same package.
>> > >
>> > > Thanks,
>> > > Satish.
>> > >
>> > >
>> > > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com>
>> wrote:
>> > >
>> > > > Hi dev,
>> > > >
>> > > > This is proposal review thread for submitting topology with adding
>> jars
>> > > and
>> > > > maven artifacts. This is also following up discussion thread for
>> > > > [DISCUSSION]
>> > > > Policy of resolving dependencies for non storm-core modules.[1]
>> > > >
>> > > > I've written design doc which also describes motivation on this.
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
>> > > >
>> > > > Please review this and comment to "this thread" instead of wiki page
>> so
>> > > > that all devs can be notified for the update.
>> > > >
>> > > > Thanks,
>> > > > Jungtaek Lim (HeartSaVioR)
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
>> > > >
>> > >
>> >
>>

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Jungtaek Lim <ka...@gmail.com>.
Ah, Satish you got the point. I meant copied version of files in
supervisor, but itself can be isolated.
I didn't think about removing blobs, and it seems not easy to do.

Jungtaek Lim (HeartSaVIoR)


2016년 8월 2일 (화) 오후 7:35, Satish Duggana <sa...@gmail.com>님이 작성:

> Hi Jungtaek,
> With the current proposal, are we removing blob store files referred by a
> topology when it is killed?
>
> Thanks,
> Satish.
>
> On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>
> > Hi Satish,
> >
> > Thanks for reviewing and share your idea.
> >
> > Yes this is shared dependencies vs isolated dependencies.
> > If we name file of dependency to contain group name, artifact name, and
> > version, that can be shared.
> > One downside of this approach is storage space since we don't know when
> > it's safe to delete without additional care, but I'm curious that disk
> > fills up due to dependency blob jar files in normal situation.
> > So I think we're OK to do this but I would like to see others opinions.
> >
> > Btw, I'm designing details based on proposal. Will update to this thread
> if
> > there're not covered things with initial design.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <sa...@gmail.com>님이 작성:
> >
> > > Hi Jungtaek,
> > > Proposal looks good to me. Good that we are not going with other
> > > alternative using mutable classloader etc.
> > >
> > > Good to have the mentioned config in proposal to add those jars before
> or
> > > after storm core/libs. There is a property Config.
> > > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as initial
> > > classpath and that should continue to be working as expected even with
> > the
> > > new configuration.
> > >
> > > One enhancement which we may want to add to the existing proposal.
> > > When --packages are used, storm submitter can upload those dependencies
> > in
> > > blob store with a defined naming convention so that same set of
> packages
> > > are not uploaded again and they can be used again for other topologies
> if
> > > they use same package.
> > >
> > > Thanks,
> > > Satish.
> > >
> > >
> > > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com>
> wrote:
> > >
> > > > Hi dev,
> > > >
> > > > This is proposal review thread for submitting topology with adding
> jars
> > > and
> > > > maven artifacts. This is also following up discussion thread for
> > > > [DISCUSSION]
> > > > Policy of resolving dependencies for non storm-core modules.[1]
> > > >
> > > > I've written design doc which also describes motivation on this.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
> > > >
> > > > Please review this and comment to "this thread" instead of wiki page
> so
> > > > that all devs can be notified for the update.
> > > >
> > > > Thanks,
> > > > Jungtaek Lim (HeartSaVioR)
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
> > > >
> > >
> >
>

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Satish Duggana <sa...@gmail.com>.
Hi Jungtaek,
With the current proposal, are we removing blob store files referred by a
topology when it is killed?

Thanks,
Satish.

On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <ka...@gmail.com> wrote:

> Hi Satish,
>
> Thanks for reviewing and share your idea.
>
> Yes this is shared dependencies vs isolated dependencies.
> If we name file of dependency to contain group name, artifact name, and
> version, that can be shared.
> One downside of this approach is storage space since we don't know when
> it's safe to delete without additional care, but I'm curious that disk
> fills up due to dependency blob jar files in normal situation.
> So I think we're OK to do this but I would like to see others opinions.
>
> Btw, I'm designing details based on proposal. Will update to this thread if
> there're not covered things with initial design.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <sa...@gmail.com>님이 작성:
>
> > Hi Jungtaek,
> > Proposal looks good to me. Good that we are not going with other
> > alternative using mutable classloader etc.
> >
> > Good to have the mentioned config in proposal to add those jars before or
> > after storm core/libs. There is a property Config.
> > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as initial
> > classpath and that should continue to be working as expected even with
> the
> > new configuration.
> >
> > One enhancement which we may want to add to the existing proposal.
> > When --packages are used, storm submitter can upload those dependencies
> in
> > blob store with a defined naming convention so that same set of packages
> > are not uploaded again and they can be used again for other topologies if
> > they use same package.
> >
> > Thanks,
> > Satish.
> >
> >
> > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > > Hi dev,
> > >
> > > This is proposal review thread for submitting topology with adding jars
> > and
> > > maven artifacts. This is also following up discussion thread for
> > > [DISCUSSION]
> > > Policy of resolving dependencies for non storm-core modules.[1]
> > >
> > > I've written design doc which also describes motivation on this.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
> > >
> > > Please review this and comment to "this thread" instead of wiki page so
> > > that all devs can be notified for the update.
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > [1]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
> > >
> >
>

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Jungtaek Lim <ka...@gmail.com>.
Hi Satish,

Thanks for reviewing and share your idea.

Yes this is shared dependencies vs isolated dependencies.
If we name file of dependency to contain group name, artifact name, and
version, that can be shared.
One downside of this approach is storage space since we don't know when
it's safe to delete without additional care, but I'm curious that disk
fills up due to dependency blob jar files in normal situation.
So I think we're OK to do this but I would like to see others opinions.

Btw, I'm designing details based on proposal. Will update to this thread if
there're not covered things with initial design.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 8월 2일 (화) 오후 6:58, Satish Duggana <sa...@gmail.com>님이 작성:

> Hi Jungtaek,
> Proposal looks good to me. Good that we are not going with other
> alternative using mutable classloader etc.
>
> Good to have the mentioned config in proposal to add those jars before or
> after storm core/libs. There is a property Config.
> TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as initial
> classpath and that should continue to be working as expected even with the
> new configuration.
>
> One enhancement which we may want to add to the existing proposal.
> When --packages are used, storm submitter can upload those dependencies in
> blob store with a defined naming convention so that same set of packages
> are not uploaded again and they can be used again for other topologies if
> they use same package.
>
> Thanks,
> Satish.
>
>
> On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com> wrote:
>
> > Hi dev,
> >
> > This is proposal review thread for submitting topology with adding jars
> and
> > maven artifacts. This is also following up discussion thread for
> > [DISCUSSION]
> > Policy of resolving dependencies for non storm-core modules.[1]
> >
> > I've written design doc which also describes motivation on this.
> >
> >
> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
> >
> > Please review this and comment to "this thread" instead of wiki page so
> > that all devs can be notified for the update.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > [1]
> >
> >
> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
> >
>

Re: [PROPOSAL] submitting topology with adding jars and maven artifacts

Posted by Satish Duggana <sa...@gmail.com>.
Hi Jungtaek,
Proposal looks good to me. Good that we are not going with other
alternative using mutable classloader etc.

Good to have the mentioned config in proposal to add those jars before or
after storm core/libs. There is a property Config.
TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as initial
classpath and that should continue to be working as expected even with the
new configuration.

One enhancement which we may want to add to the existing proposal.
When --packages are used, storm submitter can upload those dependencies in
blob store with a defined naming convention so that same set of packages
are not uploaded again and they can be used again for other topologies if
they use same package.

Thanks,
Satish.


On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <ka...@gmail.com> wrote:

> Hi dev,
>
> This is proposal review thread for submitting topology with adding jars and
> maven artifacts. This is also following up discussion thread for
> [DISCUSSION]
> Policy of resolving dependencies for non storm-core modules.[1]
>
> I've written design doc which also describes motivation on this.
>
> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
>
> Please review this and comment to "this thread" instead of wiki page so
> that all devs can be notified for the update.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzQBviscGgSC1cjug@mail.gmail.com%3E
>