You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apex.apache.org by Pramod Immaneni <pr...@datatorrent.com> on 2015/11/02 19:30:51 UTC

Modules

I have some comments and suggestions on the module design. I think these
need to be taken into account before we can merge the implementation
provided below into the mainline code. I apologize if these should have
been brought up earlier as for some reason or the other I was out of loop
on this one

    https://github.com/apache/incubator-apex-core/pull/148
<https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963>

    1. DAG scoping currently in the implementation is global for modules,
each module's populateDAG sees the entire DAG. It should be locally scoped
as one module does not and should not know about another.
    2. The module has a populateDAG method with exact same syntax as in
StreamingApplication. Is StreamingApplication also a module, should it
extend that interface.
    3. Setting properties for modules is too verbose. Module developer
needs to repeat every property they want exposed with a setter and getter
in JAVA. I don't disagree that module developer should be able to choose
which properties from which operators need to be exposed but the current
way seems to duplicate code. Here is a suggestion.
         a. Allow modules to specify which operators and properties can be
accessible from outside. One way is in the "populateDAG" method of the
module when adding the operator have the ability to specify if this
operator can be accessible from outside and which or all properties can be
accessible.
         b. Provide methods in ModuleMeta or elsewhere to set property
values by specifying the operator name (friendly name) inside the module
and property name. If this is allowed by a. above it is successful else it
should fail.
         c. Allow a syntax in property files to specify the property in b.
Example syntax dt.module.<modulename>.operator.<operatorname>.prop.<
propname>
    4. For attributes same mechanism as in 3 should apply for the operators
that are exposed by the module.  For property file, example syntax
dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
    5. Module developers in addition to 3. and 4. above may choose to
support module level properties and attributes. These should not be the
default when 3. and 4. are possible but complementary, in addition to them.
In this case for properties they can implement setters and getters in the
module. For attributes the user should still be able to set the attributes
using the dag setAttribute method. You could introduce a method in the
module to process attributes that can get called by the engine once
everything is set.
    6. For 5. above setting global properties and attributes for module is
akin to ideas that have been proposed for the application as well. A
consistent way must be possible for applications as well even if it is not
implemented now.
    7. For 5. or 6. above there should be a property file way of specifying
the global module properties and attributes. Example syntax
dt.module.<modulename>.prop.<propname>, dt.module.<modulename>.attr.<attrname>.
Notice the difference with 3. c. and 4 above that there is no operator
keyword here.
    8. Partitioning needs to be consistent with what the user will expect
when they see module as an entity. I will send an image of possible
examples of how the user will expect the physical plan to look in certain
cases.

Thanks

Re: Modules

Posted by Priyanka Gugale <pr...@datatorrent.com>.

For module packaging:
Agree first approach i.e. we load operators based on classpath version.
1. This way we get flexibility to use different versions of operators with
module without actually downloading different versions of modules.
2. We can share dependencies when application fat jar is created, if each
module bundles it's own dependencies and application modules happen to have
common set of dependencies we will package same data multiple times blowing
the size of application jar.
3. When application package is created it will be a fat-fat jar if we
choose to embed dependencies in modules :)

-Priyanka

On Fri, Nov 6, 2015 at 12:49 PM, Chinmay Kolhatkar <ch...@datatorrent.com>
wrote:

> Hi All,
>
> As requested, I've updated the pull request for locally scoping the DAG
> object. i.e. DAG object passed to populateDAG of module will not be the top
> level application DAG object. It will be specific to the module.
> Basically, the encapsulation of the modules is maintained now.
>
> Here is the design with respect to that:
> 1) ModuleMeta object is added with a new variable for LogicalPlan. This DAG
> (LogicalPlan) object is supposed to show the logical plan of the specific
> module.
> 2) ModuleMeta object is also added with a new method called as flattenDAG.
> This method does following:
>    a. Call populateDAG of current module. Here the DAG object passed is the
> one present in ModuleMeta.
>    b. Calling populateDAG can possibly add more modules in this module's
> DAG. Hence iteratively the flattenDAG will be called on submodules.
>    c. Then a method called "applyStreamLinks" will be called on current
> module to resolve the module ports to operator ports.
>    d. Finally, the current module will be added to its parent (can be
> another module OR application)
>
> Pseudo Code looks like following:
> LogicalPlan::ModuleMeta {
>   private Module module;
>   private LogicalPlan dag;
>
>   flattenModule(parentDAG, conf)
>   {
>     module.populateDAG(dag, conf);
>     forall (dag.getAllModules())
>     {
>       subModuleMeta.flattenDAG(dag, conf);
>     }
>     dag.applyStreamLinks();
>     parentDAG.addDAGToCurrentDAG(dag);
>   }
> }
>
>
> LogicalPlanConfiguration::prepareDAG()
> {
>   app.populateDAG();
>   forall(allDag.getALlModules())
>   {
>     moduleMeta.flattenDAG();
>   }
>   appDAG.applyStreamLinks();
> }
>
> Please review the pull request:
> https://github.com/apache/incubator-apex-core/pull/148
>
> Thanks,
> Chinmay.
>
>
> ~ Chinmay.
>
> On Wed, Nov 4, 2015 at 2:24 AM, Vlad Rozov <v....@datatorrent.com>
> wrote:
>
> > It is static vs dynamic binding. With static binding it is much easier to
> > certify module behavior. With dynamic binding at run-time module behavior
> > depends on *correct* implementation of semantic versioning by embedded
> > operators. An operator may introduce new mandatory property and such
> change
> > will not be catch during semantic version verification check, but will
> > break module behavior.
> >
> > Who will be responsible for shading the dependencies? Is it module
> > designer (not clear why) or application designer? Should the
> functionality
> > be provided by the Apache Apex platform, not by module/application
> > designers?
> >
> > I suggest that we support the first option only for now and see if we
> need
> > to support the second option in the future.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 11/3/15 10:49, Thomas Weise wrote:
> >
> >> Second option can be done by shading the dependencies. More reliable and
> >> simpler than class loader tricks.
> >>
> >> But why would you need the second option to start with? Why would an
> >> application define conflicting dependencies? I don't see that happen
> often
> >> when dependencies follow semantic versioning.
> >>
> >>
> >> On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v....@datatorrent.com>
> >> wrote:
> >>
> >> I think that just build tool will not provide necessary functionality
> for
> >>> the second option. Module will have to implement it's own class loader
> >>> otherwise behavior will be undefined in case where application embeds
> >>> module and independently different version of an operator included into
> >>> the
> >>> module.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 11/3/15 10:10, Thomas Weise wrote:
> >>>
> >>> Both are valid options and to be solved through the build tool. In one
> >>>> case
> >>>> the dependency is declared, in the latter embedded.
> >>>>
> >>>> Thomas
> >>>>
> >>>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>> This brings one more issue that we did not cover in the module design
> >>>>
> >>>>> discussions.  What happens when new versions of operators embedded
> >>>>> into a
> >>>>> module become available? I believe we all go with the assumption that
> >>>>> module will pickup version available on the classpath at run-time and
> >>>>> it
> >>>>> is
> >>>>> operator developer responsibility to provide full binary
> compatibility.
> >>>>> Another possible behavior is to consider module as completely
> >>>>> independent
> >>>>> unit and package all necessary libraries along with the module.
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>>> Vlad
> >>>>>
> >>>>>
> >>>>> On 11/3/15 09:40, Pramod Immaneni wrote:
> >>>>>
> >>>>> I am not suggesting we leak the internals or compromise access
> >>>>> modifiers.
> >>>>>
> >>>>>> I
> >>>>>> want the module developer to have the ability (not mandatory) to
> make
> >>>>>> available all or a subset of the properties of an operator easily if
> >>>>>> they
> >>>>>> desire without having to create setter/getter for each of them. You
> >>>>>> don't
> >>>>>> have to expose the operator they belong to. My preference would also
> >>>>>> be
> >>>>>> to
> >>>>>> preserve the namespace of the properties in some way for example by
> >>>>>> grouping them by operator name. Think about scenario where people
> have
> >>>>>> built modules using kafka input operator and there is a new kafka
> >>>>>> connection property. Without having this ability the modules have to
> >>>>>> be
> >>>>>> changed to support this property. With this feature the module
> >>>>>> developers
> >>>>>> have a choice whether to keep the list of kafka properties fixed in
> >>>>>> the
> >>>>>> module or allow new properties.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <
> thomas@datatorrent.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> There is also the option to inherit common module properties
> through a
> >>>>>>
> >>>>>> base
> >>>>>>> class.
> >>>>>>>
> >>>>>>> I don't see how this is any different from an operator. The
> developer
> >>>>>>> decides what gets exposed and has the same options to control it.
> >>>>>>>
> >>>>>>> Encapsulation is good practice, by leaking the module internals the
> >>>>>>> using
> >>>>>>> code becomes brittle.
> >>>>>>>
> >>>>>>> Thomas
> >>>>>>>
> >>>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> The same goes for a Java or C++ class that changes its api. In
> >>>>>>> general
> >>>>>>> this
> >>>>>>>
> >>>>>>> is left to the developer, and these languages have internals as
> >>>>>>> private
> >>>>>>>
> >>>>>>>> by
> >>>>>>>>
> >>>>>>> default for precisely the same purpose. The module developer must
> >>>>>>> have
> >>>>>>>
> >>>>>>>> the
> >>>>>>>>
> >>>>>>> right to change internals, keep api clean/constant and expect user
> >>>>>>> code
> >>>>>>>
> >>>>>>>> to
> >>>>>>>>
> >>>>>>> not break.
> >>>>>>>
> >>>>>>>> Thks,
> >>>>>>>> Amol
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
> >>>>>>>> pramod@datatorrent.com
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> For 3 and 4 can't we strike a balance between not having to expose
> >>>>>>>> the
> >>>>>>>>
> >>>>>>>> operators underneath and at the same time not having to write
> >>>>>>>>>
> >>>>>>>>> boilerplate
> >>>>>>>>>
> >>>>>>>> code for all the properties that the module wants to make
> available
> >>>>>>>>
> >>>>>>>> outside. It can quickly become unmanageable. For example, an input
> >>>>>>>>>
> >>>>>>>>> operator
> >>>>>>>>>
> >>>>>>>> has a new connection property which can be used outside and now
> all
> >>>>>>>>
> >>>>>>>>> the
> >>>>>>>>> modules using that operator, their code has to be modified to
> just
> >>>>>>>>> add
> >>>>>>>>>
> >>>>>>>>> a
> >>>>>>>>>
> >>>>>>>> pass through setter/getter. How about treating the operator name
> as
> >>>>>>>> a
> >>>>>>>> group
> >>>>>>>>
> >>>>>>>> name and ability for module developers to easily make
> >>>>>>>>
> >>>>>>>>> available/specify
> >>>>>>>>>
> >>>>>>>>> all
> >>>>>>>>>
> >>>>>>>> or a subset of the properties of an operator to the user without
> >>>>>>>>
> >>>>>>>>> having
> >>>>>>>>>
> >>>>>>>>> to
> >>>>>>>>>
> >>>>>>>> explicitly make each of them a module property.
> >>>>>>>>
> >>>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <amol@datatorrent.com
> >
> >>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>> 3,4 should follow conventions where the creator decides the api
> >>>>>>>>
> >>>>>>>> (including
> >>>>>>>>>
> >>>>>>>>> accessibility). In general only those properties exposed by
> module
> >>>>>>>>>
> >>>>>>>>>> creator
> >>>>>>>>>>
> >>>>>>>>> should be settable. What the module internally does with them is
> >>>>>>>>>
> >>>>>>>>>> module
> >>>>>>>>>>
> >>>>>>>>> designer's call. Accessing internals of module from outside is
> >>>>>>>>
> >>>>>>>> uncommon.
> >>>>>>>>> For exampe in Java (or C++) private fields/members are not to be
> >>>>>>>>> accessed.
> >>>>>>>>>
> >>>>>>>>> Properties (setter and getter) are the api that module designer
> >>>>>>>>> gives
> >>>>>>>>>
> >>>>>>>>>> to
> >>>>>>>>>>
> >>>>>>>>> the module user. It is dangerous and has unintended consequences
> if
> >>>>>>>>> module
> >>>>>>>>>
> >>>>>>>>> user starts to access internals outside the api.
> >>>>>>>>>
> >>>>>>>>>> Partitioning should be next phase. As long as current design
> does
> >>>>>>>>>> not
> >>>>>>>>>>
> >>>>>>>>>> halt
> >>>>>>>>>>
> >>>>>>>>> partitioning it should be ok (which I believe is true).
> >>>>>>>>>
> >>>>>>>>>> Thks,
> >>>>>>>>>> Amol
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <
> >>>>>>>>>> v.rozov@datatorrent.com
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> 1. +1, though passing original DAG to module's populateDAG() it
> is
> >>>>>>>>>> not
> >>>>>>>>>>
> >>>>>>>>>> by
> >>>>>>>>>
> >>>>>>>>> design and is the current pull request implementation details.
> >>>>>>>>>
> >>>>>>>>>> 2. While I agree that both Module and StreamingApplication let's
> >>>>>>>>>>> module/application designer to expose DAG design reuse pattern
> >>>>>>>>>>> and
> >>>>>>>>>>> StreamingApplication interface may be extending Module, it does
> >>>>>>>>>>> not
> >>>>>>>>>>>
> >>>>>>>>>>> seem
> >>>>>>>>>>>
> >>>>>>>>>> to
> >>>>>>>>>>
> >>>>>>>>>> buy us much. Do we want to allow certain applications to be
> reused
> >>>>>>>>>>
> >>>>>>>>>>> as
> >>>>>>>>>>>
> >>>>>>>>>> Modules in other applications or should application package be
> >>>>>>>>> different
> >>>>>>>>>
> >>>>>>>>>> from Module package? The current approach is to distribute
> Modules
> >>>>>>>>>> as
> >>>>>>>>>>
> >>>>>>>>>> part
> >>>>>>>>> of .jar for example as part of Malhar library without necessarily
> >>>>>>>>>
> >>>>>>>>>> providing
> >>>>>>>>>>>
> >>>>>>>>>> all necessary dependencies. Application package on other side
> must
> >>>>>>>>>>
> >>>>>>>>>>> include
> >>>>>>>>>>>
> >>>>>>>>>> all dependencies not provided by the platform.
> >>>>>>>>>>
> >>>>>>>>>>> 3, 4. While this will help Module designer, it may complicate
> >>>>>>>>>>>
> >>>>>>>>>>> Module
> >>>>>>>>>>>
> >>>>>>>>>> maintenance and how Modules are used. What if Module designer
> >>>>>>>>> wants
> >>>>>>>>> to
> >>>>>>>>> change it's implementation and replace one operator
> implementation
> >>>>>>>>>
> >>>>>>>>> with
> >>>>>>>>>>
> >>>>>>>>>> another operator? Does StreamingApplication designer need to
> know
> >>>>>>>>>
> >>>>>>>>> internal
> >>>>>>>>>>
> >>>>>>>>>> structure of Modules? Should Module be considered as a black box
> >>>>>>>>>>
> >>>>>>>>>>> during
> >>>>>>>>>>>
> >>>>>>>>>> Application design time as it was initially planned?
> >>>>>>>>>
> >>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
> >>>>>>>>>>
> >>>>>>>>>>> functionality
> >>>>>>>>>>>
> >>>>>>>>>> the way I understand it.
> >>>>>>>>>
> >>>>>>>>> 8. We need to see what Module designer can specify for
> >>>>>>>>>>
> >>>>>>>>>>> partitioning.
> >>>>>>>>>>>
> >>>>>>>>>> One
> >>>>>>>>> of supported cases should be ability to specify cascading
> >>>>>>>>>
> >>>>>>>>>> partitioning
> >>>>>>>>>>
> >>>>>>>>>> scheme.
> >>>>>>>>>
> >>>>>>>>> Thank you,
> >>>>>>>>>>
> >>>>>>>>>>> Vlad
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I have some comments and suggestions on the module design. I
> >>>>>>>>>>> think
> >>>>>>>>>>> these
> >>>>>>>>>>>
> >>>>>>>>>>> need to be taken into account before we can merge the
> >>>>>>>>>>
> >>>>>>>>>> implementation
> >>>>>>>>>>>
> >>>>>>>>>>> provided below into the mainline code. I apologize if these
> >>>>>>>>>> should
> >>>>>>>>>>
> >>>>>>>>> have
> >>>>>>>>>
> >>>>>>>>>> been brought up earlier as for some reason or the other I was
> out
> >>>>>>>>>>
> >>>>>>>>>> of
> >>>>>>>>>>>
> >>>>>>>>>>> loop
> >>>>>>>>>>
> >>>>>>>>> on this one
> >>>>>>>>>
> >>>>>>>>>>         https://github.com/apache/incubator-apex-core/pull/148
> >>>>>>>>>>>> <
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
> >>>>>>>
> >>>>>>>         1. DAG scoping currently in the implementation is global
> for
> >>>>>>>
> >>>>>>>> modules,
> >>>>>>>>>
> >>>>>>>>>> each module's populateDAG sees the entire DAG. It should be
> >>>>>>>>>>> locally
> >>>>>>>>>>>
> >>>>>>>>>>> scoped
> >>>>>>>>>>
> >>>>>>>>> as one module does not and should not know about another.
> >>>>>>>>>
> >>>>>>>>>>         2. The module has a populateDAG method with exact same
> >>>>>>>>>>>> syntax
> >>>>>>>>>>>>
> >>>>>>>>>>>> as
> >>>>>>>>>>>>
> >>>>>>>>>>> in
> >>>>>>>>>>
> >>>>>>>>> StreamingApplication. Is StreamingApplication also a module,
> >>>>>>>>>
> >>>>>>>>>> should
> >>>>>>>>>>>
> >>>>>>>>>>> it
> >>>>>>>>>>
> >>>>>>>>> extend that interface.
> >>>>>>>>
> >>>>>>>>>         3. Setting properties for modules is too verbose. Module
> >>>>>>>>>>
> >>>>>>>>>>> developer
> >>>>>>>>>>>>
> >>>>>>>>>>> needs to repeat every property they want exposed with a setter
> >>>>>>>>>> and
> >>>>>>>>>>
> >>>>>>>>>> getter
> >>>>>>>>>>> in JAVA. I don't disagree that module developer should be able
> to
> >>>>>>>>>>> choose
> >>>>>>>>>>>
> >>>>>>>>>>> which properties from which operators need to be exposed but
> the
> >>>>>>>>>>
> >>>>>>>>>> current
> >>>>>>>>>>>
> >>>>>>>>>>> way seems to duplicate code. Here is a suggestion.
> >>>>>>>>>>
> >>>>>>>>>>              a. Allow modules to specify which operators and
> >>>>>>>>>>>
> >>>>>>>>>>>> properties
> >>>>>>>>>>>>
> >>>>>>>>>>> can
> >>>>>>>>>>
> >>>>>>>>> be
> >>>>>>>>>
> >>>>>>>>>> accessible from outside. One way is in the "populateDAG" method
> of
> >>>>>>>>>>>
> >>>>>>>>>>>> the
> >>>>>>>>>>>>
> >>>>>>>>>>> module when adding the operator have the ability to specify if
> >>>>>>>>>> this
> >>>>>>>>>> operator can be accessible from outside and which or all
> >>>>>>>>>>
> >>>>>>>>> properties
> >>>>>>>>>
> >>>>>>>>>> can
> >>>>>>>>>>
> >>>>>>>>> be
> >>>>>>>>>
> >>>>>>>>>> accessible.
> >>>>>>>>>>
> >>>>>>>>>>>              b. Provide methods in ModuleMeta or elsewhere to
> set
> >>>>>>>>>>>>
> >>>>>>>>>>>> property
> >>>>>>>>>>>>
> >>>>>>>>>>> values by specifying the operator name (friendly name) inside
> the
> >>>>>>>>>>
> >>>>>>>>>> module
> >>>>>>>>>>>
> >>>>>>>>>>> and property name. If this is allowed by a. above it is
> >>>>>>>>>> successful
> >>>>>>>>>>
> >>>>>>>>>> else
> >>>>>>>>>>>
> >>>>>>>>>>> it
> >>>>>>>>>>
> >>>>>>>>>> should fail.
> >>>>>>>>>>
> >>>>>>>>>>>              c. Allow a syntax in property files to specify the
> >>>>>>>>>>>>
> >>>>>>>>>>>> property
> >>>>>>>>>>>>
> >>>>>>>>>>> in
> >>>>>>>>>>
> >>>>>>>>> b.
> >>>>>>>>>
> >>>>>>>>>> Example syntax
> >>>>>>>>>>>
> >>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
> >>>>>>>>>>>>
> >>>>>>>>>>> propname>
> >>>>>>>>>>
> >>>>>>>>>         4. For attributes same mechanism as in 3 should apply for
> >>>>>>>>> the
> >>>>>>>>>
> >>>>>>>>>> operators
> >>>>>>>>>>>> that are exposed by the module.  For property file, example
> >>>>>>>>>>>> syntax
> >>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
> >>>>>>>>>>>>         5. Module developers in addition to 3. and 4. above
> may
> >>>>>>>>>>>>
> >>>>>>>>>>>> choose
> >>>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>
> >>>>>>>>> support module level properties and attributes. These should not
> >>>>>>>>
> >>>>>>>>> be
> >>>>>>>>>> the
> >>>>>>>>>>
> >>>>>>>>> default when 3. and 4. are possible but complementary, in
> addition
> >>>>>>>>>
> >>>>>>>>>> to
> >>>>>>>>>>>
> >>>>>>>>>>> them.
> >>>>>>>>>> In this case for properties they can implement setters and
> getters
> >>>>>>>>>>
> >>>>>>>>>>> in
> >>>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>> module. For attributes the user should still be able to set the
> >>>>>>>>>>
> >>>>>>>>>>> attributes
> >>>>>>>>>>> using the dag setAttribute method. You could introduce a method
> >>>>>>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> module to process attributes that can get called by the engine
> >>>>>>>>>> once
> >>>>>>>>>> everything is set.
> >>>>>>>>>>
> >>>>>>>>>         6. For 5. above setting global properties and attributes
> >>>>>>>>> for
> >>>>>>>>>
> >>>>>>>>>> module
> >>>>>>>>>>>>
> >>>>>>>>>>> is
> >>>>>>>>>>
> >>>>>>>>>> akin to ideas that have been proposed for the application as
> >>>>>>>>>>>
> >>>>>>>>>>>> well. A
> >>>>>>>>>>>>
> >>>>>>>>>>> consistent way must be possible for applications as well even
> if
> >>>>>>>>>>
> >>>>>>>>> it
> >>>>>>>>>
> >>>>>>>>>> is
> >>>>>>>>>>
> >>>>>>>>> not
> >>>>>>>>
> >>>>>>>>> implemented now.
> >>>>>>>>>>
> >>>>>>>>>>>         7. For 5. or 6. above there should be a property file
> way
> >>>>>>>>>>>> of
> >>>>>>>>>>>> specifying
> >>>>>>>>>>>> the global module properties and attributes. Example syntax
> >>>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
> >>>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
> >>>>>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
> >>>>>>>>>>>>
> >>>>>>>>>>>> operator
> >>>>>>>>>>>>
> >>>>>>>>>>> keyword here.
> >>>>>>>>>>         8. Partitioning needs to be consistent with what the
> user
> >>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>>>>
> >>>>>>>>>>> expect
> >>>>>>>>>>
> >>>>>>>>> when they see module as an entity. I will send an image of
> >>>>>>>>>
> >>>>>>>>>> possible
> >>>>>>>>>>>
> >>>>>>>>>>> examples of how the user will expect the physical plan to look
> in
> >>>>>>>>>>
> >>>>>>>>> certain
> >>>>>>>>>
> >>>>>>>>>> cases.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >
>

Re: Modules

Posted by Chinmay Kolhatkar <ch...@datatorrent.com>.

Hi All,

As requested, I've updated the pull request for locally scoping the DAG
object. i.e. DAG object passed to populateDAG of module will not be the top
level application DAG object. It will be specific to the module.
Basically, the encapsulation of the modules is maintained now.

Here is the design with respect to that:
1) ModuleMeta object is added with a new variable for LogicalPlan. This DAG
(LogicalPlan) object is supposed to show the logical plan of the specific
module.
2) ModuleMeta object is also added with a new method called as flattenDAG.
This method does following:
   a. Call populateDAG of current module. Here the DAG object passed is the
one present in ModuleMeta.
   b. Calling populateDAG can possibly add more modules in this module's
DAG. Hence iteratively the flattenDAG will be called on submodules.
   c. Then a method called "applyStreamLinks" will be called on current
module to resolve the module ports to operator ports.
   d. Finally, the current module will be added to its parent (can be
another module OR application)

Pseudo Code looks like following:
LogicalPlan::ModuleMeta {
  private Module module;
  private LogicalPlan dag;

  flattenModule(parentDAG, conf)
  {
    module.populateDAG(dag, conf);
    forall (dag.getAllModules())
    {
      subModuleMeta.flattenDAG(dag, conf);
    }
    dag.applyStreamLinks();
    parentDAG.addDAGToCurrentDAG(dag);
  }
}


LogicalPlanConfiguration::prepareDAG()
{
  app.populateDAG();
  forall(allDag.getALlModules())
  {
    moduleMeta.flattenDAG();
  }
  appDAG.applyStreamLinks();
}

Please review the pull request:
https://github.com/apache/incubator-apex-core/pull/148

Thanks,
Chinmay.


~ Chinmay.

On Wed, Nov 4, 2015 at 2:24 AM, Vlad Rozov <v....@datatorrent.com> wrote:

> It is static vs dynamic binding. With static binding it is much easier to
> certify module behavior. With dynamic binding at run-time module behavior
> depends on *correct* implementation of semantic versioning by embedded
> operators. An operator may introduce new mandatory property and such change
> will not be catch during semantic version verification check, but will
> break module behavior.
>
> Who will be responsible for shading the dependencies? Is it module
> designer (not clear why) or application designer? Should the functionality
> be provided by the Apache Apex platform, not by module/application
> designers?
>
> I suggest that we support the first option only for now and see if we need
> to support the second option in the future.
>
> Thank you,
>
> Vlad
>
>
> On 11/3/15 10:49, Thomas Weise wrote:
>
>> Second option can be done by shading the dependencies. More reliable and
>> simpler than class loader tricks.
>>
>> But why would you need the second option to start with? Why would an
>> application define conflicting dependencies? I don't see that happen often
>> when dependencies follow semantic versioning.
>>
>>
>> On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v....@datatorrent.com>
>> wrote:
>>
>> I think that just build tool will not provide necessary functionality for
>>> the second option. Module will have to implement it's own class loader
>>> otherwise behavior will be undefined in case where application embeds
>>> module and independently different version of an operator included into
>>> the
>>> module.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 11/3/15 10:10, Thomas Weise wrote:
>>>
>>> Both are valid options and to be solved through the build tool. In one
>>>> case
>>>> the dependency is declared, in the latter embedded.
>>>>
>>>> Thomas
>>>>
>>>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com>
>>>> wrote:
>>>>
>>>> This brings one more issue that we did not cover in the module design
>>>>
>>>>> discussions.  What happens when new versions of operators embedded
>>>>> into a
>>>>> module become available? I believe we all go with the assumption that
>>>>> module will pickup version available on the classpath at run-time and
>>>>> it
>>>>> is
>>>>> operator developer responsibility to provide full binary compatibility.
>>>>> Another possible behavior is to consider module as completely
>>>>> independent
>>>>> unit and package all necessary libraries along with the module.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>>
>>>>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>>>>
>>>>> I am not suggesting we leak the internals or compromise access
>>>>> modifiers.
>>>>>
>>>>>> I
>>>>>> want the module developer to have the ability (not mandatory) to make
>>>>>> available all or a subset of the properties of an operator easily if
>>>>>> they
>>>>>> desire without having to create setter/getter for each of them. You
>>>>>> don't
>>>>>> have to expose the operator they belong to. My preference would also
>>>>>> be
>>>>>> to
>>>>>> preserve the namespace of the properties in some way for example by
>>>>>> grouping them by operator name. Think about scenario where people have
>>>>>> built modules using kafka input operator and there is a new kafka
>>>>>> connection property. Without having this ability the modules have to
>>>>>> be
>>>>>> changed to support this property. With this feature the module
>>>>>> developers
>>>>>> have a choice whether to keep the list of kafka properties fixed in
>>>>>> the
>>>>>> module or allow new properties.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> There is also the option to inherit common module properties through a
>>>>>>
>>>>>> base
>>>>>>> class.
>>>>>>>
>>>>>>> I don't see how this is any different from an operator. The developer
>>>>>>> decides what gets exposed and has the same options to control it.
>>>>>>>
>>>>>>> Encapsulation is good practice, by leaking the module internals the
>>>>>>> using
>>>>>>> code becomes brittle.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> The same goes for a Java or C++ class that changes its api. In
>>>>>>> general
>>>>>>> this
>>>>>>>
>>>>>>> is left to the developer, and these languages have internals as
>>>>>>> private
>>>>>>>
>>>>>>>> by
>>>>>>>>
>>>>>>> default for precisely the same purpose. The module developer must
>>>>>>> have
>>>>>>>
>>>>>>>> the
>>>>>>>>
>>>>>>> right to change internals, keep api clean/constant and expect user
>>>>>>> code
>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>> not break.
>>>>>>>
>>>>>>>> Thks,
>>>>>>>> Amol
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
>>>>>>>> pramod@datatorrent.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> For 3 and 4 can't we strike a balance between not having to expose
>>>>>>>> the
>>>>>>>>
>>>>>>>> operators underneath and at the same time not having to write
>>>>>>>>>
>>>>>>>>> boilerplate
>>>>>>>>>
>>>>>>>> code for all the properties that the module wants to make available
>>>>>>>>
>>>>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>>>>>
>>>>>>>>> operator
>>>>>>>>>
>>>>>>>> has a new connection property which can be used outside and now all
>>>>>>>>
>>>>>>>>> the
>>>>>>>>> modules using that operator, their code has to be modified to just
>>>>>>>>> add
>>>>>>>>>
>>>>>>>>> a
>>>>>>>>>
>>>>>>>> pass through setter/getter. How about treating the operator name as
>>>>>>>> a
>>>>>>>> group
>>>>>>>>
>>>>>>>> name and ability for module developers to easily make
>>>>>>>>
>>>>>>>>> available/specify
>>>>>>>>>
>>>>>>>>> all
>>>>>>>>>
>>>>>>>> or a subset of the properties of an operator to the user without
>>>>>>>>
>>>>>>>>> having
>>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> explicitly make each of them a module property.
>>>>>>>>
>>>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> 3,4 should follow conventions where the creator decides the api
>>>>>>>>
>>>>>>>> (including
>>>>>>>>>
>>>>>>>>> accessibility). In general only those properties exposed by module
>>>>>>>>>
>>>>>>>>>> creator
>>>>>>>>>>
>>>>>>>>> should be settable. What the module internally does with them is
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>
>>>>>>>>> designer's call. Accessing internals of module from outside is
>>>>>>>>
>>>>>>>> uncommon.
>>>>>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>>>>> accessed.
>>>>>>>>>
>>>>>>>>> Properties (setter and getter) are the api that module designer
>>>>>>>>> gives
>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> the module user. It is dangerous and has unintended consequences if
>>>>>>>>> module
>>>>>>>>>
>>>>>>>>> user starts to access internals outside the api.
>>>>>>>>>
>>>>>>>>>> Partitioning should be next phase. As long as current design does
>>>>>>>>>> not
>>>>>>>>>>
>>>>>>>>>> halt
>>>>>>>>>>
>>>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>>>
>>>>>>>>>> Thks,
>>>>>>>>>> Amol
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <
>>>>>>>>>> v.rozov@datatorrent.com
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>>>>>>>>> not
>>>>>>>>>>
>>>>>>>>>> by
>>>>>>>>>
>>>>>>>>> design and is the current pull request implementation details.
>>>>>>>>>
>>>>>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>>>>>>> module/application designer to expose DAG design reuse pattern
>>>>>>>>>>> and
>>>>>>>>>>> StreamingApplication interface may be extending Module, it does
>>>>>>>>>>> not
>>>>>>>>>>>
>>>>>>>>>>> seem
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>> buy us much. Do we want to allow certain applications to be reused
>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>>>>
>>>>>>>>>> Modules in other applications or should application package be
>>>>>>>>> different
>>>>>>>>>
>>>>>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>>>>> as
>>>>>>>>>>
>>>>>>>>>> part
>>>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>>>
>>>>>>>>>> providing
>>>>>>>>>>>
>>>>>>>>>> all necessary dependencies. Application package on other side must
>>>>>>>>>>
>>>>>>>>>>> include
>>>>>>>>>>>
>>>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>>>
>>>>>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>>>>>
>>>>>>>>>>> Module
>>>>>>>>>>>
>>>>>>>>>> maintenance and how Modules are used. What if Module designer
>>>>>>>>> wants
>>>>>>>>> to
>>>>>>>>> change it's implementation and replace one operator implementation
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>>>>
>>>>>>>>> internal
>>>>>>>>>>
>>>>>>>>>> structure of Modules? Should Module be considered as a black box
>>>>>>>>>>
>>>>>>>>>>> during
>>>>>>>>>>>
>>>>>>>>>> Application design time as it was initially planned?
>>>>>>>>>
>>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>>>
>>>>>>>>>>> functionality
>>>>>>>>>>>
>>>>>>>>>> the way I understand it.
>>>>>>>>>
>>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>>>
>>>>>>>>>>> partitioning.
>>>>>>>>>>>
>>>>>>>>>> One
>>>>>>>>> of supported cases should be ability to specify cascading
>>>>>>>>>
>>>>>>>>>> partitioning
>>>>>>>>>>
>>>>>>>>>> scheme.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>>> Vlad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>>>>
>>>>>>>>>>> I have some comments and suggestions on the module design. I
>>>>>>>>>>> think
>>>>>>>>>>> these
>>>>>>>>>>>
>>>>>>>>>>> need to be taken into account before we can merge the
>>>>>>>>>>
>>>>>>>>>> implementation
>>>>>>>>>>>
>>>>>>>>>>> provided below into the mainline code. I apologize if these
>>>>>>>>>> should
>>>>>>>>>>
>>>>>>>>> have
>>>>>>>>>
>>>>>>>>>> been brought up earlier as for some reason or the other I was out
>>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>> loop
>>>>>>>>>>
>>>>>>>>> on this one
>>>>>>>>>
>>>>>>>>>>         https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>>>>> <
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>>>
>>>>>>>         1. DAG scoping currently in the implementation is global for
>>>>>>>
>>>>>>>> modules,
>>>>>>>>>
>>>>>>>>>> each module's populateDAG sees the entire DAG. It should be
>>>>>>>>>>> locally
>>>>>>>>>>>
>>>>>>>>>>> scoped
>>>>>>>>>>
>>>>>>>>> as one module does not and should not know about another.
>>>>>>>>>
>>>>>>>>>>         2. The module has a populateDAG method with exact same
>>>>>>>>>>>> syntax
>>>>>>>>>>>>
>>>>>>>>>>>> as
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>>>
>>>>>>>>>> should
>>>>>>>>>>>
>>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>> extend that interface.
>>>>>>>>
>>>>>>>>>         3. Setting properties for modules is too verbose. Module
>>>>>>>>>>
>>>>>>>>>>> developer
>>>>>>>>>>>>
>>>>>>>>>>> needs to repeat every property they want exposed with a setter
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> getter
>>>>>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>>>>>>>>> choose
>>>>>>>>>>>
>>>>>>>>>>> which properties from which operators need to be exposed but the
>>>>>>>>>>
>>>>>>>>>> current
>>>>>>>>>>>
>>>>>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>>>
>>>>>>>>>>              a. Allow modules to specify which operators and
>>>>>>>>>>>
>>>>>>>>>>>> properties
>>>>>>>>>>>>
>>>>>>>>>>> can
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>> module when adding the operator have the ability to specify if
>>>>>>>>>> this
>>>>>>>>>> operator can be accessible from outside and which or all
>>>>>>>>>>
>>>>>>>>> properties
>>>>>>>>>
>>>>>>>>>> can
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> accessible.
>>>>>>>>>>
>>>>>>>>>>>              b. Provide methods in ModuleMeta or elsewhere to set
>>>>>>>>>>>>
>>>>>>>>>>>> property
>>>>>>>>>>>>
>>>>>>>>>>> values by specifying the operator name (friendly name) inside the
>>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>>
>>>>>>>>>>> and property name. If this is allowed by a. above it is
>>>>>>>>>> successful
>>>>>>>>>>
>>>>>>>>>> else
>>>>>>>>>>>
>>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>> should fail.
>>>>>>>>>>
>>>>>>>>>>>              c. Allow a syntax in property files to specify the
>>>>>>>>>>>>
>>>>>>>>>>>> property
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>> b.
>>>>>>>>>
>>>>>>>>>> Example syntax
>>>>>>>>>>>
>>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>>>>>>>
>>>>>>>>>>> propname>
>>>>>>>>>>
>>>>>>>>>         4. For attributes same mechanism as in 3 should apply for
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> operators
>>>>>>>>>>>> that are exposed by the module.  For property file, example
>>>>>>>>>>>> syntax
>>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>>>>         5. Module developers in addition to 3. and 4. above may
>>>>>>>>>>>>
>>>>>>>>>>>> choose
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> support module level properties and attributes. These should not
>>>>>>>>
>>>>>>>>> be
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> them.
>>>>>>>>>> In this case for properties they can implement setters and getters
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>> module. For attributes the user should still be able to set the
>>>>>>>>>>
>>>>>>>>>>> attributes
>>>>>>>>>>> using the dag setAttribute method. You could introduce a method
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> module to process attributes that can get called by the engine
>>>>>>>>>> once
>>>>>>>>>> everything is set.
>>>>>>>>>>
>>>>>>>>>         6. For 5. above setting global properties and attributes
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>> akin to ideas that have been proposed for the application as
>>>>>>>>>>>
>>>>>>>>>>>> well. A
>>>>>>>>>>>>
>>>>>>>>>>> consistent way must be possible for applications as well even if
>>>>>>>>>>
>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>> not
>>>>>>>>
>>>>>>>>> implemented now.
>>>>>>>>>>
>>>>>>>>>>>         7. For 5. or 6. above there should be a property file way
>>>>>>>>>>>> of
>>>>>>>>>>>> specifying
>>>>>>>>>>>> the global module properties and attributes. Example syntax
>>>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>>>>>>>>>>>
>>>>>>>>>>>> operator
>>>>>>>>>>>>
>>>>>>>>>>> keyword here.
>>>>>>>>>>         8. Partitioning needs to be consistent with what the user
>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>>>>
>>>>>>>>>>> expect
>>>>>>>>>>
>>>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>>>
>>>>>>>>>> possible
>>>>>>>>>>>
>>>>>>>>>>> examples of how the user will expect the physical plan to look in
>>>>>>>>>>
>>>>>>>>> certain
>>>>>>>>>
>>>>>>>>>> cases.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>

Re: Modules

Posted by Vlad Rozov <v....@datatorrent.com>.

It is static vs dynamic binding. With static binding it is much easier 
to certify module behavior. With dynamic binding at run-time module 
behavior depends on *correct* implementation of semantic versioning by 
embedded operators. An operator may introduce new mandatory property and 
such change will not be catch during semantic version verification 
check, but will break module behavior.

Who will be responsible for shading the dependencies? Is it module 
designer (not clear why) or application designer? Should the 
functionality be provided by the Apache Apex platform, not by 
module/application designers?

I suggest that we support the first option only for now and see if we 
need to support the second option in the future.

Thank you,

Vlad

On 11/3/15 10:49, Thomas Weise wrote:
> Second option can be done by shading the dependencies. More reliable and
> simpler than class loader tricks.
>
> But why would you need the second option to start with? Why would an
> application define conflicting dependencies? I don't see that happen often
> when dependencies follow semantic versioning.
>
>
> On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v....@datatorrent.com> wrote:
>
>> I think that just build tool will not provide necessary functionality for
>> the second option. Module will have to implement it's own class loader
>> otherwise behavior will be undefined in case where application embeds
>> module and independently different version of an operator included into the
>> module.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On 11/3/15 10:10, Thomas Weise wrote:
>>
>>> Both are valid options and to be solved through the build tool. In one
>>> case
>>> the dependency is declared, in the latter embedded.
>>>
>>> Thomas
>>>
>>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com>
>>> wrote:
>>>
>>> This brings one more issue that we did not cover in the module design
>>>> discussions.  What happens when new versions of operators embedded into a
>>>> module become available? I believe we all go with the assumption that
>>>> module will pickup version available on the classpath at run-time and it
>>>> is
>>>> operator developer responsibility to provide full binary compatibility.
>>>> Another possible behavior is to consider module as completely independent
>>>> unit and package all necessary libraries along with the module.
>>>>
>>>> Thank you,
>>>>
>>>> Vlad
>>>>
>>>>
>>>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>>>
>>>> I am not suggesting we leak the internals or compromise access modifiers.
>>>>> I
>>>>> want the module developer to have the ability (not mandatory) to make
>>>>> available all or a subset of the properties of an operator easily if
>>>>> they
>>>>> desire without having to create setter/getter for each of them. You
>>>>> don't
>>>>> have to expose the operator they belong to. My preference would also be
>>>>> to
>>>>> preserve the namespace of the properties in some way for example by
>>>>> grouping them by operator name. Think about scenario where people have
>>>>> built modules using kafka input operator and there is a new kafka
>>>>> connection property. Without having this ability the modules have to be
>>>>> changed to support this property. With this feature the module
>>>>> developers
>>>>> have a choice whether to keep the list of kafka properties fixed in the
>>>>> module or allow new properties.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>> There is also the option to inherit common module properties through a
>>>>>
>>>>>> base
>>>>>> class.
>>>>>>
>>>>>> I don't see how this is any different from an operator. The developer
>>>>>> decides what gets exposed and has the same options to control it.
>>>>>>
>>>>>> Encapsulation is good practice, by leaking the module internals the
>>>>>> using
>>>>>> code becomes brittle.
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> The same goes for a Java or C++ class that changes its api. In general
>>>>>> this
>>>>>>
>>>>>> is left to the developer, and these languages have internals as private
>>>>>>> by
>>>>>> default for precisely the same purpose. The module developer must have
>>>>>>> the
>>>>>> right to change internals, keep api clean/constant and expect user code
>>>>>>> to
>>>>>> not break.
>>>>>>> Thks,
>>>>>>> Amol
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
>>>>>>> pramod@datatorrent.com
>>>>>>> wrote:
>>>>>>>
>>>>>>> For 3 and 4 can't we strike a balance between not having to expose the
>>>>>>>
>>>>>>>> operators underneath and at the same time not having to write
>>>>>>>>
>>>>>>>> boilerplate
>>>>>>> code for all the properties that the module wants to make available
>>>>>>>
>>>>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>>>>
>>>>>>>> operator
>>>>>>> has a new connection property which can be used outside and now all
>>>>>>>> the
>>>>>>>> modules using that operator, their code has to be modified to just
>>>>>>>> add
>>>>>>>>
>>>>>>>> a
>>>>>>> pass through setter/getter. How about treating the operator name as a
>>>>>>> group
>>>>>>>
>>>>>>> name and ability for module developers to easily make
>>>>>>>> available/specify
>>>>>>>>
>>>>>>>> all
>>>>>>> or a subset of the properties of an operator to the user without
>>>>>>>> having
>>>>>>>>
>>>>>>>> to
>>>>>>> explicitly make each of them a module property.
>>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>> 3,4 should follow conventions where the creator decides the api
>>>>>>>
>>>>>>>> (including
>>>>>>>>
>>>>>>>> accessibility). In general only those properties exposed by module
>>>>>>>>> creator
>>>>>>>> should be settable. What the module internally does with them is
>>>>>>>>> module
>>>>>>> designer's call. Accessing internals of module from outside is
>>>>>>>
>>>>>>>> uncommon.
>>>>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>>>> accessed.
>>>>>>>>
>>>>>>>> Properties (setter and getter) are the api that module designer gives
>>>>>>>>> to
>>>>>>>> the module user. It is dangerous and has unintended consequences if
>>>>>>>> module
>>>>>>>>
>>>>>>>> user starts to access internals outside the api.
>>>>>>>>> Partitioning should be next phase. As long as current design does
>>>>>>>>> not
>>>>>>>>>
>>>>>>>>> halt
>>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>>> Thks,
>>>>>>>>> Amol
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v.rozov@datatorrent.com
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>>>>>>>> not
>>>>>>>>>
>>>>>>>> by
>>>>>>>>
>>>>>>>> design and is the current pull request implementation details.
>>>>>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>>>>>> module/application designer to expose DAG design reuse pattern and
>>>>>>>>>> StreamingApplication interface may be extending Module, it does not
>>>>>>>>>>
>>>>>>>>>> seem
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> buy us much. Do we want to allow certain applications to be reused
>>>>>>>>>> as
>>>>>>>> Modules in other applications or should application package be
>>>>>>>> different
>>>>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>>>> as
>>>>>>>>>
>>>>>>>> part
>>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>>>> providing
>>>>>>>>> all necessary dependencies. Application package on other side must
>>>>>>>>>> include
>>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>>>>
>>>>>>>>>> Module
>>>>>>>> maintenance and how Modules are used. What if Module designer wants
>>>>>>>> to
>>>>>>>> change it's implementation and replace one operator implementation
>>>>>>>>
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>>>
>>>>>>>>> internal
>>>>>>>>>
>>>>>>>>> structure of Modules? Should Module be considered as a black box
>>>>>>>>>> during
>>>>>>>> Application design time as it was initially planned?
>>>>>>>>
>>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>>> functionality
>>>>>>>> the way I understand it.
>>>>>>>>
>>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>>> partitioning.
>>>>>>>> One
>>>>>>>> of supported cases should be ability to specify cascading
>>>>>>>>> partitioning
>>>>>>>>>
>>>>>>>> scheme.
>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>> Vlad
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>>>
>>>>>>>>>> I have some comments and suggestions on the module design. I think
>>>>>>>>>> these
>>>>>>>>>>
>>>>>>>>> need to be taken into account before we can merge the
>>>>>>>>>
>>>>>>>>>> implementation
>>>>>>>>>>
>>>>>>>>> provided below into the mainline code. I apologize if these should
>>>>>>>> have
>>>>>>>>> been brought up earlier as for some reason or the other I was out
>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>>>
>>>>>>>>> loop
>>>>>>>> on this one
>>>>>>>>>>>         https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>>>> <
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>>
>>>>>>         1. DAG scoping currently in the implementation is global for
>>>>>>>> modules,
>>>>>>>>>> each module's populateDAG sees the entire DAG. It should be
>>>>>>>>>> locally
>>>>>>>>>>
>>>>>>>>> scoped
>>>>>>>> as one module does not and should not know about another.
>>>>>>>>>>>         2. The module has a populateDAG method with exact same
>>>>>>>>>>> syntax
>>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>> in
>>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>>>> should
>>>>>>>>>>
>>>>>>>>> it
>>>>>>> extend that interface.
>>>>>>>>>         3. Setting properties for modules is too verbose. Module
>>>>>>>>>>> developer
>>>>>>>>> needs to repeat every property they want exposed with a setter and
>>>>>>>>>
>>>>>>>>>> getter
>>>>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>>>>>>>> choose
>>>>>>>>>>
>>>>>>>>> which properties from which operators need to be exposed but the
>>>>>>>>>
>>>>>>>>>> current
>>>>>>>>>>
>>>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>>
>>>>>>>>>>              a. Allow modules to specify which operators and
>>>>>>>>>>> properties
>>>>>>>>> can
>>>>>>>> be
>>>>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>>>>>>>>>> the
>>>>>>>>> module when adding the operator have the ability to specify if
>>>>>>>>> this
>>>>>>>>> operator can be accessible from outside and which or all
>>>>>>>> properties
>>>>>>>>> can
>>>>>>>> be
>>>>>>>>> accessible.
>>>>>>>>>>>              b. Provide methods in ModuleMeta or elsewhere to set
>>>>>>>>>>>
>>>>>>>>>>> property
>>>>>>>>> values by specifying the operator name (friendly name) inside the
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>
>>>>>>>>> and property name. If this is allowed by a. above it is successful
>>>>>>>>>
>>>>>>>>>> else
>>>>>>>>>>
>>>>>>>>> it
>>>>>>>>>
>>>>>>>>> should fail.
>>>>>>>>>>>              c. Allow a syntax in property files to specify the
>>>>>>>>>>>
>>>>>>>>>>> property
>>>>>>>>> in
>>>>>>>> b.
>>>>>>>>>> Example syntax
>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>>>> propname>
>>>>>>>>         4. For attributes same mechanism as in 3 should apply for the
>>>>>>>>>>> operators
>>>>>>>>>>> that are exposed by the module.  For property file, example syntax
>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>>>         5. Module developers in addition to 3. and 4. above may
>>>>>>>>>>>
>>>>>>>>>>> choose
>>>>>>>>> to
>>>>>>> support module level properties and attributes. These should not
>>>>>>>>> be
>>>>>>>>> the
>>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> them.
>>>>>>>>> In this case for properties they can implement setters and getters
>>>>>>>>>>> in
>>>>>>>>> the
>>>>>>>>> module. For attributes the user should still be able to set the
>>>>>>>>>> attributes
>>>>>>>>>> using the dag setAttribute method. You could introduce a method in
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>> module to process attributes that can get called by the engine
>>>>>>>>> once
>>>>>>>>> everything is set.
>>>>>>>>         6. For 5. above setting global properties and attributes for
>>>>>>>>>>> module
>>>>>>>>> is
>>>>>>>>>
>>>>>>>>>> akin to ideas that have been proposed for the application as
>>>>>>>>>>> well. A
>>>>>>>>> consistent way must be possible for applications as well even if
>>>>>>>> it
>>>>>>>>> is
>>>>>>> not
>>>>>>>>> implemented now.
>>>>>>>>>>>         7. For 5. or 6. above there should be a property file way
>>>>>>>>>>> of
>>>>>>>>>>> specifying
>>>>>>>>>>> the global module properties and attributes. Example syntax
>>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>>>>>>>>>>
>>>>>>>>>>> operator
>>>>>>>>> keyword here.
>>>>>>>>>         8. Partitioning needs to be consistent with what the user
>>>>>>>>>>> will
>>>>>>>>> expect
>>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>>>> possible
>>>>>>>>>>
>>>>>>>>> examples of how the user will expect the physical plan to look in
>>>>>>>> certain
>>>>>>>>>> cases.
>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>

Re: Modules

Posted by Thomas Weise <th...@datatorrent.com>.

Second option can be done by shading the dependencies. More reliable and
simpler than class loader tricks.

But why would you need the second option to start with? Why would an
application define conflicting dependencies? I don't see that happen often
when dependencies follow semantic versioning.


On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v....@datatorrent.com> wrote:

> I think that just build tool will not provide necessary functionality for
> the second option. Module will have to implement it's own class loader
> otherwise behavior will be undefined in case where application embeds
> module and independently different version of an operator included into the
> module.
>
> Thank you,
>
> Vlad
>
>
> On 11/3/15 10:10, Thomas Weise wrote:
>
>> Both are valid options and to be solved through the build tool. In one
>> case
>> the dependency is declared, in the latter embedded.
>>
>> Thomas
>>
>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com>
>> wrote:
>>
>> This brings one more issue that we did not cover in the module design
>>> discussions.  What happens when new versions of operators embedded into a
>>> module become available? I believe we all go with the assumption that
>>> module will pickup version available on the classpath at run-time and it
>>> is
>>> operator developer responsibility to provide full binary compatibility.
>>> Another possible behavior is to consider module as completely independent
>>> unit and package all necessary libraries along with the module.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>>
>>> I am not suggesting we leak the internals or compromise access modifiers.
>>>> I
>>>> want the module developer to have the ability (not mandatory) to make
>>>> available all or a subset of the properties of an operator easily if
>>>> they
>>>> desire without having to create setter/getter for each of them. You
>>>> don't
>>>> have to expose the operator they belong to. My preference would also be
>>>> to
>>>> preserve the namespace of the properties in some way for example by
>>>> grouping them by operator name. Think about scenario where people have
>>>> built modules using kafka input operator and there is a new kafka
>>>> connection property. Without having this ability the modules have to be
>>>> changed to support this property. With this feature the module
>>>> developers
>>>> have a choice whether to keep the list of kafka properties fixed in the
>>>> module or allow new properties.
>>>>
>>>> Thanks
>>>>
>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
>>>> wrote:
>>>>
>>>> There is also the option to inherit common module properties through a
>>>>
>>>>> base
>>>>> class.
>>>>>
>>>>> I don't see how this is any different from an operator. The developer
>>>>> decides what gets exposed and has the same options to control it.
>>>>>
>>>>> Encapsulation is good practice, by leaking the module internals the
>>>>> using
>>>>> code becomes brittle.
>>>>>
>>>>> Thomas
>>>>>
>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>> The same goes for a Java or C++ class that changes its api. In general
>>>>> this
>>>>>
>>>>> is left to the developer, and these languages have internals as private
>>>>>>
>>>>>> by
>>>>>
>>>>> default for precisely the same purpose. The module developer must have
>>>>>>
>>>>>> the
>>>>>
>>>>> right to change internals, keep api clean/constant and expect user code
>>>>>>
>>>>>> to
>>>>>
>>>>> not break.
>>>>>>
>>>>>> Thks,
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
>>>>>> pramod@datatorrent.com
>>>>>> wrote:
>>>>>>
>>>>>> For 3 and 4 can't we strike a balance between not having to expose the
>>>>>>
>>>>>>> operators underneath and at the same time not having to write
>>>>>>>
>>>>>>> boilerplate
>>>>>> code for all the properties that the module wants to make available
>>>>>>
>>>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>>>
>>>>>>> operator
>>>>>>
>>>>>> has a new connection property which can be used outside and now all
>>>>>>> the
>>>>>>> modules using that operator, their code has to be modified to just
>>>>>>> add
>>>>>>>
>>>>>>> a
>>>>>> pass through setter/getter. How about treating the operator name as a
>>>>>> group
>>>>>>
>>>>>> name and ability for module developers to easily make
>>>>>>> available/specify
>>>>>>>
>>>>>>> all
>>>>>>
>>>>>> or a subset of the properties of an operator to the user without
>>>>>>> having
>>>>>>>
>>>>>>> to
>>>>>>
>>>>>> explicitly make each of them a module property.
>>>>>>>
>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>>>>>>>
>>>>>>> wrote:
>>>>>> 3,4 should follow conventions where the creator decides the api
>>>>>>
>>>>>>> (including
>>>>>>>
>>>>>>> accessibility). In general only those properties exposed by module
>>>>>>>>
>>>>>>>> creator
>>>>>>>
>>>>>>> should be settable. What the module internally does with them is
>>>>>>>>
>>>>>>>> module
>>>>>>>
>>>>>> designer's call. Accessing internals of module from outside is
>>>>>>
>>>>>>> uncommon.
>>>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>>> accessed.
>>>>>>>
>>>>>>> Properties (setter and getter) are the api that module designer gives
>>>>>>>>
>>>>>>>> to
>>>>>>> the module user. It is dangerous and has unintended consequences if
>>>>>>> module
>>>>>>>
>>>>>>> user starts to access internals outside the api.
>>>>>>>>
>>>>>>>> Partitioning should be next phase. As long as current design does
>>>>>>>> not
>>>>>>>>
>>>>>>>> halt
>>>>>>>
>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>>
>>>>>>>> Thks,
>>>>>>>> Amol
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v.rozov@datatorrent.com
>>>>>>>> >
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>>>>>>> not
>>>>>>>>
>>>>>>> by
>>>>>>>
>>>>>>> design and is the current pull request implementation details.
>>>>>>>>
>>>>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>>>>> module/application designer to expose DAG design reuse pattern and
>>>>>>>>> StreamingApplication interface may be extending Module, it does not
>>>>>>>>>
>>>>>>>>> seem
>>>>>>>> to
>>>>>>>>
>>>>>>>> buy us much. Do we want to allow certain applications to be reused
>>>>>>>>>
>>>>>>>>> as
>>>>>>>>
>>>>>>> Modules in other applications or should application package be
>>>>>>
>>>>>>> different
>>>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>>> as
>>>>>>>>
>>>>>>> part
>>>>>>
>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>>>
>>>>>>>>> providing
>>>>>>>>
>>>>>>>> all necessary dependencies. Application package on other side must
>>>>>>>>>
>>>>>>>>> include
>>>>>>>>
>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>>
>>>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>>>
>>>>>>>>> Module
>>>>>>>>
>>>>>>> maintenance and how Modules are used. What if Module designer wants
>>>>>>
>>>>>>> to
>>>>>>>>
>>>>>>> change it's implementation and replace one operator implementation
>>>>>>>
>>>>>>>> with
>>>>>>>>
>>>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>>
>>>>>>>> internal
>>>>>>>>
>>>>>>>> structure of Modules? Should Module be considered as a black box
>>>>>>>>>
>>>>>>>>> during
>>>>>>>>
>>>>>>> Application design time as it was initially planned?
>>>>>>>
>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>>
>>>>>>>>> functionality
>>>>>>>>
>>>>>>> the way I understand it.
>>>>>>>
>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>>
>>>>>>>>> partitioning.
>>>>>>>>
>>>>>>> One
>>>>>>
>>>>>>> of supported cases should be ability to specify cascading
>>>>>>>> partitioning
>>>>>>>>
>>>>>>> scheme.
>>>>>>>
>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Vlad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>>
>>>>>>>>> I have some comments and suggestions on the module design. I think
>>>>>>>>> these
>>>>>>>>>
>>>>>>>> need to be taken into account before we can merge the
>>>>>>>>
>>>>>>>>> implementation
>>>>>>>>>
>>>>>>>> provided below into the mainline code. I apologize if these should
>>>>>>
>>>>>>> have
>>>>>>>>>
>>>>>>>> been brought up earlier as for some reason or the other I was out
>>>>>>>>
>>>>>>>>> of
>>>>>>>>>
>>>>>>>> loop
>>>>>>
>>>>>>> on this one
>>>>>>>>>
>>>>>>>>>>        https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>
>>>>>        1. DAG scoping currently in the implementation is global for
>>>>>>
>>>>>>> modules,
>>>>>>>>> each module's populateDAG sees the entire DAG. It should be
>>>>>>>>> locally
>>>>>>>>>
>>>>>>>> scoped
>>>>>>
>>>>>>> as one module does not and should not know about another.
>>>>>>>>>
>>>>>>>>>>        2. The module has a populateDAG method with exact same
>>>>>>>>>> syntax
>>>>>>>>>>
>>>>>>>>>> as
>>>>>>>>>
>>>>>>>> in
>>>>>>>
>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>>
>>>>>>>>> should
>>>>>>>>>
>>>>>>>> it
>>>>>>
>>>>>> extend that interface.
>>>>>>>
>>>>>>>>        3. Setting properties for modules is too verbose. Module
>>>>>>>>>>
>>>>>>>>>> developer
>>>>>>>>>
>>>>>>>> needs to repeat every property they want exposed with a setter and
>>>>>>>>
>>>>>>>>> getter
>>>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>>>>>>> choose
>>>>>>>>>
>>>>>>>> which properties from which operators need to be exposed but the
>>>>>>>>
>>>>>>>>> current
>>>>>>>>>
>>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>
>>>>>>>>>             a. Allow modules to specify which operators and
>>>>>>>>>>
>>>>>>>>>> properties
>>>>>>>>>
>>>>>>>> can
>>>>>>
>>>>>>> be
>>>>>>>>
>>>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>> module when adding the operator have the ability to specify if
>>>>>>>
>>>>>>>> this
>>>>>>>>>
>>>>>>>> operator can be accessible from outside and which or all
>>>>>>
>>>>>>> properties
>>>>>>>>>
>>>>>>>> can
>>>>>>
>>>>>>> be
>>>>>>>>
>>>>>>>> accessible.
>>>>>>>>>
>>>>>>>>>>             b. Provide methods in ModuleMeta or elsewhere to set
>>>>>>>>>>
>>>>>>>>>> property
>>>>>>>>>
>>>>>>>> values by specifying the operator name (friendly name) inside the
>>>>>>>>
>>>>>>>>> module
>>>>>>>>>
>>>>>>>> and property name. If this is allowed by a. above it is successful
>>>>>>>>
>>>>>>>>> else
>>>>>>>>>
>>>>>>>> it
>>>>>>>>
>>>>>>>> should fail.
>>>>>>>>>
>>>>>>>>>>             c. Allow a syntax in property files to specify the
>>>>>>>>>>
>>>>>>>>>> property
>>>>>>>>>
>>>>>>>> in
>>>>>>>
>>>>>>> b.
>>>>>>>>
>>>>>>>>> Example syntax
>>>>>>>>>>
>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>>>>
>>>>>>>> propname>
>>>>>>
>>>>>>>        4. For attributes same mechanism as in 3 should apply for the
>>>>>>>>>> operators
>>>>>>>>>> that are exposed by the module.  For property file, example syntax
>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>>        5. Module developers in addition to 3. and 4. above may
>>>>>>>>>>
>>>>>>>>>> choose
>>>>>>>>>
>>>>>>>> to
>>>>>>
>>>>>> support module level properties and attributes. These should not
>>>>>>>
>>>>>>>> be
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> them.
>>>>>>>
>>>>>>>> In this case for properties they can implement setters and getters
>>>>>>>>>>
>>>>>>>>>> in
>>>>>>>>>
>>>>>>>> the
>>>>>>>
>>>>>>>> module. For attributes the user should still be able to set the
>>>>>>>>> attributes
>>>>>>>>> using the dag setAttribute method. You could introduce a method in
>>>>>>>>> the
>>>>>>>>>
>>>>>>>> module to process attributes that can get called by the engine
>>>>>>>
>>>>>>>> once
>>>>>>>>>
>>>>>>>> everything is set.
>>>>>>
>>>>>>>        6. For 5. above setting global properties and attributes for
>>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>
>>>>>>>> is
>>>>>>>>
>>>>>>>>> akin to ideas that have been proposed for the application as
>>>>>>>>>>
>>>>>>>>>> well. A
>>>>>>>>>
>>>>>>>> consistent way must be possible for applications as well even if
>>>>>>
>>>>>>> it
>>>>>>>>>
>>>>>>>> is
>>>>>>
>>>>>> not
>>>>>>>
>>>>>>>> implemented now.
>>>>>>>>>
>>>>>>>>>>        7. For 5. or 6. above there should be a property file way
>>>>>>>>>> of
>>>>>>>>>> specifying
>>>>>>>>>> the global module properties and attributes. Example syntax
>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>>>>>>>>>
>>>>>>>>>> operator
>>>>>>>>>
>>>>>>>> keyword here.
>>>>>>>
>>>>>>>>        8. Partitioning needs to be consistent with what the user
>>>>>>>>>>
>>>>>>>>>> will
>>>>>>>>>
>>>>>>>> expect
>>>>>>
>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>>> possible
>>>>>>>>>
>>>>>>>> examples of how the user will expect the physical plan to look in
>>>>>>
>>>>>>> certain
>>>>>>>>> cases.
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>

Re: Modules

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Classpath can be engineered

On Tue, Nov 3, 2015 at 10:44 AM, Vlad Rozov <v....@datatorrent.com> wrote:

> Even in case when operator and module are not instantiated in the same
> container there will be two different versions on the classpath, so it is
> undefined which one will be used.
>
> Thank you,
>
> Vlad
>
>
> On 11/3/15 10:29, Pramod Immaneni wrote:
>
>> Depends if the operator and module are container local or not.
>>
>> On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v....@datatorrent.com>
>> wrote:
>>
>> I think that just build tool will not provide necessary functionality for
>>> the second option. Module will have to implement it's own class loader
>>> otherwise behavior will be undefined in case where application embeds
>>> module and independently different version of an operator included into
>>> the
>>> module.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 11/3/15 10:10, Thomas Weise wrote:
>>>
>>> Both are valid options and to be solved through the build tool. In one
>>>> case
>>>> the dependency is declared, in the latter embedded.
>>>>
>>>> Thomas
>>>>
>>>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com>
>>>> wrote:
>>>>
>>>> This brings one more issue that we did not cover in the module design
>>>>
>>>>> discussions.  What happens when new versions of operators embedded
>>>>> into a
>>>>> module become available? I believe we all go with the assumption that
>>>>> module will pickup version available on the classpath at run-time and
>>>>> it
>>>>> is
>>>>> operator developer responsibility to provide full binary compatibility.
>>>>> Another possible behavior is to consider module as completely
>>>>> independent
>>>>> unit and package all necessary libraries along with the module.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>>
>>>>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>>>>
>>>>> I am not suggesting we leak the internals or compromise access
>>>>> modifiers.
>>>>>
>>>>>> I
>>>>>> want the module developer to have the ability (not mandatory) to make
>>>>>> available all or a subset of the properties of an operator easily if
>>>>>> they
>>>>>> desire without having to create setter/getter for each of them. You
>>>>>> don't
>>>>>> have to expose the operator they belong to. My preference would also
>>>>>> be
>>>>>> to
>>>>>> preserve the namespace of the properties in some way for example by
>>>>>> grouping them by operator name. Think about scenario where people have
>>>>>> built modules using kafka input operator and there is a new kafka
>>>>>> connection property. Without having this ability the modules have to
>>>>>> be
>>>>>> changed to support this property. With this feature the module
>>>>>> developers
>>>>>> have a choice whether to keep the list of kafka properties fixed in
>>>>>> the
>>>>>> module or allow new properties.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> There is also the option to inherit common module properties through a
>>>>>>
>>>>>> base
>>>>>>> class.
>>>>>>>
>>>>>>> I don't see how this is any different from an operator. The developer
>>>>>>> decides what gets exposed and has the same options to control it.
>>>>>>>
>>>>>>> Encapsulation is good practice, by leaking the module internals the
>>>>>>> using
>>>>>>> code becomes brittle.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> The same goes for a Java or C++ class that changes its api. In
>>>>>>> general
>>>>>>> this
>>>>>>>
>>>>>>> is left to the developer, and these languages have internals as
>>>>>>> private
>>>>>>>
>>>>>>>> by
>>>>>>>>
>>>>>>> default for precisely the same purpose. The module developer must
>>>>>>> have
>>>>>>>
>>>>>>>> the
>>>>>>>>
>>>>>>> right to change internals, keep api clean/constant and expect user
>>>>>>> code
>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>> not break.
>>>>>>>
>>>>>>>> Thks,
>>>>>>>> Amol
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
>>>>>>>> pramod@datatorrent.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> For 3 and 4 can't we strike a balance between not having to expose
>>>>>>>> the
>>>>>>>>
>>>>>>>> operators underneath and at the same time not having to write
>>>>>>>>>
>>>>>>>>> boilerplate
>>>>>>>>>
>>>>>>>> code for all the properties that the module wants to make available
>>>>>>>>
>>>>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>>>>>
>>>>>>>>> operator
>>>>>>>>>
>>>>>>>> has a new connection property which can be used outside and now all
>>>>>>>>
>>>>>>>>> the
>>>>>>>>> modules using that operator, their code has to be modified to just
>>>>>>>>> add
>>>>>>>>>
>>>>>>>>> a
>>>>>>>>>
>>>>>>>> pass through setter/getter. How about treating the operator name as
>>>>>>>> a
>>>>>>>> group
>>>>>>>>
>>>>>>>> name and ability for module developers to easily make
>>>>>>>>
>>>>>>>>> available/specify
>>>>>>>>>
>>>>>>>>> all
>>>>>>>>>
>>>>>>>> or a subset of the properties of an operator to the user without
>>>>>>>>
>>>>>>>>> having
>>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> explicitly make each of them a module property.
>>>>>>>>
>>>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> 3,4 should follow conventions where the creator decides the api
>>>>>>>>
>>>>>>>> (including
>>>>>>>>>
>>>>>>>>> accessibility). In general only those properties exposed by module
>>>>>>>>>
>>>>>>>>>> creator
>>>>>>>>>>
>>>>>>>>> should be settable. What the module internally does with them is
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>
>>>>>>>>> designer's call. Accessing internals of module from outside is
>>>>>>>>
>>>>>>>> uncommon.
>>>>>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>>>>> accessed.
>>>>>>>>>
>>>>>>>>> Properties (setter and getter) are the api that module designer
>>>>>>>>> gives
>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> the module user. It is dangerous and has unintended consequences if
>>>>>>>>> module
>>>>>>>>>
>>>>>>>>> user starts to access internals outside the api.
>>>>>>>>>
>>>>>>>>>> Partitioning should be next phase. As long as current design does
>>>>>>>>>> not
>>>>>>>>>>
>>>>>>>>>> halt
>>>>>>>>>>
>>>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>>>
>>>>>>>>>> Thks,
>>>>>>>>>> Amol
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <
>>>>>>>>>> v.rozov@datatorrent.com
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>>>>>>>>> not
>>>>>>>>>>
>>>>>>>>>> by
>>>>>>>>>
>>>>>>>>> design and is the current pull request implementation details.
>>>>>>>>>
>>>>>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>>>>>>> module/application designer to expose DAG design reuse pattern
>>>>>>>>>>> and
>>>>>>>>>>> StreamingApplication interface may be extending Module, it does
>>>>>>>>>>> not
>>>>>>>>>>>
>>>>>>>>>>> seem
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>> buy us much. Do we want to allow certain applications to be reused
>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>>>>
>>>>>>>>>> Modules in other applications or should application package be
>>>>>>>>> different
>>>>>>>>>
>>>>>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>>>>> as
>>>>>>>>>>
>>>>>>>>>> part
>>>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>>>
>>>>>>>>>> providing
>>>>>>>>>>>
>>>>>>>>>> all necessary dependencies. Application package on other side must
>>>>>>>>>>
>>>>>>>>>>> include
>>>>>>>>>>>
>>>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>>>
>>>>>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>>>>>
>>>>>>>>>>> Module
>>>>>>>>>>>
>>>>>>>>>> maintenance and how Modules are used. What if Module designer
>>>>>>>>> wants
>>>>>>>>> to
>>>>>>>>> change it's implementation and replace one operator implementation
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>>>>
>>>>>>>>> internal
>>>>>>>>>>
>>>>>>>>>> structure of Modules? Should Module be considered as a black box
>>>>>>>>>>
>>>>>>>>>>> during
>>>>>>>>>>>
>>>>>>>>>> Application design time as it was initially planned?
>>>>>>>>>
>>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>>>
>>>>>>>>>>> functionality
>>>>>>>>>>>
>>>>>>>>>> the way I understand it.
>>>>>>>>>
>>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>>>
>>>>>>>>>>> partitioning.
>>>>>>>>>>>
>>>>>>>>>> One
>>>>>>>>> of supported cases should be ability to specify cascading
>>>>>>>>>
>>>>>>>>>> partitioning
>>>>>>>>>>
>>>>>>>>>> scheme.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>>> Vlad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>>>>
>>>>>>>>>>> I have some comments and suggestions on the module design. I
>>>>>>>>>>> think
>>>>>>>>>>> these
>>>>>>>>>>>
>>>>>>>>>>> need to be taken into account before we can merge the
>>>>>>>>>>
>>>>>>>>>> implementation
>>>>>>>>>>>
>>>>>>>>>>> provided below into the mainline code. I apologize if these
>>>>>>>>>> should
>>>>>>>>>>
>>>>>>>>> have
>>>>>>>>>
>>>>>>>>>> been brought up earlier as for some reason or the other I was out
>>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>> loop
>>>>>>>>>>
>>>>>>>>> on this one
>>>>>>>>>
>>>>>>>>>>         https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>>>>> <
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>>>
>>>>>>>         1. DAG scoping currently in the implementation is global for
>>>>>>>
>>>>>>>> modules,
>>>>>>>>>
>>>>>>>>>> each module's populateDAG sees the entire DAG. It should be
>>>>>>>>>>> locally
>>>>>>>>>>>
>>>>>>>>>>> scoped
>>>>>>>>>>
>>>>>>>>> as one module does not and should not know about another.
>>>>>>>>>
>>>>>>>>>>         2. The module has a populateDAG method with exact same
>>>>>>>>>>>> syntax
>>>>>>>>>>>>
>>>>>>>>>>>> as
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>>>
>>>>>>>>>> should
>>>>>>>>>>>
>>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>> extend that interface.
>>>>>>>>
>>>>>>>>>         3. Setting properties for modules is too verbose. Module
>>>>>>>>>>
>>>>>>>>>>> developer
>>>>>>>>>>>>
>>>>>>>>>>> needs to repeat every property they want exposed with a setter
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> getter
>>>>>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>>>>>>>>> choose
>>>>>>>>>>>
>>>>>>>>>>> which properties from which operators need to be exposed but the
>>>>>>>>>>
>>>>>>>>>> current
>>>>>>>>>>>
>>>>>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>>>
>>>>>>>>>>              a. Allow modules to specify which operators and
>>>>>>>>>>>
>>>>>>>>>>>> properties
>>>>>>>>>>>>
>>>>>>>>>>> can
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>> module when adding the operator have the ability to specify if
>>>>>>>>>> this
>>>>>>>>>> operator can be accessible from outside and which or all
>>>>>>>>>>
>>>>>>>>> properties
>>>>>>>>>
>>>>>>>>>> can
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> accessible.
>>>>>>>>>>
>>>>>>>>>>>              b. Provide methods in ModuleMeta or elsewhere to set
>>>>>>>>>>>>
>>>>>>>>>>>> property
>>>>>>>>>>>>
>>>>>>>>>>> values by specifying the operator name (friendly name) inside the
>>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>>
>>>>>>>>>>> and property name. If this is allowed by a. above it is
>>>>>>>>>> successful
>>>>>>>>>>
>>>>>>>>>> else
>>>>>>>>>>>
>>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>> should fail.
>>>>>>>>>>
>>>>>>>>>>>              c. Allow a syntax in property files to specify the
>>>>>>>>>>>>
>>>>>>>>>>>> property
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>> b.
>>>>>>>>>
>>>>>>>>>> Example syntax
>>>>>>>>>>>
>>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>>>>>>>
>>>>>>>>>>> propname>
>>>>>>>>>>
>>>>>>>>>         4. For attributes same mechanism as in 3 should apply for
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> operators
>>>>>>>>>>>> that are exposed by the module.  For property file, example
>>>>>>>>>>>> syntax
>>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>>>>         5. Module developers in addition to 3. and 4. above may
>>>>>>>>>>>>
>>>>>>>>>>>> choose
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> support module level properties and attributes. These should not
>>>>>>>>
>>>>>>>>> be
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> them.
>>>>>>>>>> In this case for properties they can implement setters and getters
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>> module. For attributes the user should still be able to set the
>>>>>>>>>>
>>>>>>>>>>> attributes
>>>>>>>>>>> using the dag setAttribute method. You could introduce a method
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> module to process attributes that can get called by the engine
>>>>>>>>>> once
>>>>>>>>>> everything is set.
>>>>>>>>>>
>>>>>>>>>         6. For 5. above setting global properties and attributes
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>> akin to ideas that have been proposed for the application as
>>>>>>>>>>>
>>>>>>>>>>>> well. A
>>>>>>>>>>>>
>>>>>>>>>>> consistent way must be possible for applications as well even if
>>>>>>>>>>
>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>> not
>>>>>>>>
>>>>>>>>> implemented now.
>>>>>>>>>>
>>>>>>>>>>>         7. For 5. or 6. above there should be a property file way
>>>>>>>>>>>> of
>>>>>>>>>>>> specifying
>>>>>>>>>>>> the global module properties and attributes. Example syntax
>>>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>>>>>>>>>>>
>>>>>>>>>>>> operator
>>>>>>>>>>>>
>>>>>>>>>>> keyword here.
>>>>>>>>>>         8. Partitioning needs to be consistent with what the user
>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>>>>
>>>>>>>>>>> expect
>>>>>>>>>>
>>>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>>>
>>>>>>>>>> possible
>>>>>>>>>>>
>>>>>>>>>>> examples of how the user will expect the physical plan to look in
>>>>>>>>>>
>>>>>>>>> certain
>>>>>>>>>
>>>>>>>>>> cases.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>

Re: Modules

Posted by Vlad Rozov <v....@datatorrent.com>.

Even in case when operator and module are not instantiated in the same 
container there will be two different versions on the classpath, so it 
is undefined which one will be used.

Thank you,

Vlad

On 11/3/15 10:29, Pramod Immaneni wrote:
> Depends if the operator and module are container local or not.
>
> On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v....@datatorrent.com> wrote:
>
>> I think that just build tool will not provide necessary functionality for
>> the second option. Module will have to implement it's own class loader
>> otherwise behavior will be undefined in case where application embeds
>> module and independently different version of an operator included into the
>> module.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On 11/3/15 10:10, Thomas Weise wrote:
>>
>>> Both are valid options and to be solved through the build tool. In one
>>> case
>>> the dependency is declared, in the latter embedded.
>>>
>>> Thomas
>>>
>>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com>
>>> wrote:
>>>
>>> This brings one more issue that we did not cover in the module design
>>>> discussions.  What happens when new versions of operators embedded into a
>>>> module become available? I believe we all go with the assumption that
>>>> module will pickup version available on the classpath at run-time and it
>>>> is
>>>> operator developer responsibility to provide full binary compatibility.
>>>> Another possible behavior is to consider module as completely independent
>>>> unit and package all necessary libraries along with the module.
>>>>
>>>> Thank you,
>>>>
>>>> Vlad
>>>>
>>>>
>>>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>>>
>>>> I am not suggesting we leak the internals or compromise access modifiers.
>>>>> I
>>>>> want the module developer to have the ability (not mandatory) to make
>>>>> available all or a subset of the properties of an operator easily if
>>>>> they
>>>>> desire without having to create setter/getter for each of them. You
>>>>> don't
>>>>> have to expose the operator they belong to. My preference would also be
>>>>> to
>>>>> preserve the namespace of the properties in some way for example by
>>>>> grouping them by operator name. Think about scenario where people have
>>>>> built modules using kafka input operator and there is a new kafka
>>>>> connection property. Without having this ability the modules have to be
>>>>> changed to support this property. With this feature the module
>>>>> developers
>>>>> have a choice whether to keep the list of kafka properties fixed in the
>>>>> module or allow new properties.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>> There is also the option to inherit common module properties through a
>>>>>
>>>>>> base
>>>>>> class.
>>>>>>
>>>>>> I don't see how this is any different from an operator. The developer
>>>>>> decides what gets exposed and has the same options to control it.
>>>>>>
>>>>>> Encapsulation is good practice, by leaking the module internals the
>>>>>> using
>>>>>> code becomes brittle.
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> The same goes for a Java or C++ class that changes its api. In general
>>>>>> this
>>>>>>
>>>>>> is left to the developer, and these languages have internals as private
>>>>>>> by
>>>>>> default for precisely the same purpose. The module developer must have
>>>>>>> the
>>>>>> right to change internals, keep api clean/constant and expect user code
>>>>>>> to
>>>>>> not break.
>>>>>>> Thks,
>>>>>>> Amol
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
>>>>>>> pramod@datatorrent.com
>>>>>>> wrote:
>>>>>>>
>>>>>>> For 3 and 4 can't we strike a balance between not having to expose the
>>>>>>>
>>>>>>>> operators underneath and at the same time not having to write
>>>>>>>>
>>>>>>>> boilerplate
>>>>>>> code for all the properties that the module wants to make available
>>>>>>>
>>>>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>>>>
>>>>>>>> operator
>>>>>>> has a new connection property which can be used outside and now all
>>>>>>>> the
>>>>>>>> modules using that operator, their code has to be modified to just
>>>>>>>> add
>>>>>>>>
>>>>>>>> a
>>>>>>> pass through setter/getter. How about treating the operator name as a
>>>>>>> group
>>>>>>>
>>>>>>> name and ability for module developers to easily make
>>>>>>>> available/specify
>>>>>>>>
>>>>>>>> all
>>>>>>> or a subset of the properties of an operator to the user without
>>>>>>>> having
>>>>>>>>
>>>>>>>> to
>>>>>>> explicitly make each of them a module property.
>>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>> 3,4 should follow conventions where the creator decides the api
>>>>>>>
>>>>>>>> (including
>>>>>>>>
>>>>>>>> accessibility). In general only those properties exposed by module
>>>>>>>>> creator
>>>>>>>> should be settable. What the module internally does with them is
>>>>>>>>> module
>>>>>>> designer's call. Accessing internals of module from outside is
>>>>>>>
>>>>>>>> uncommon.
>>>>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>>>> accessed.
>>>>>>>>
>>>>>>>> Properties (setter and getter) are the api that module designer gives
>>>>>>>>> to
>>>>>>>> the module user. It is dangerous and has unintended consequences if
>>>>>>>> module
>>>>>>>>
>>>>>>>> user starts to access internals outside the api.
>>>>>>>>> Partitioning should be next phase. As long as current design does
>>>>>>>>> not
>>>>>>>>>
>>>>>>>>> halt
>>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>>> Thks,
>>>>>>>>> Amol
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v.rozov@datatorrent.com
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>>>>>>>> not
>>>>>>>>>
>>>>>>>> by
>>>>>>>>
>>>>>>>> design and is the current pull request implementation details.
>>>>>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>>>>>> module/application designer to expose DAG design reuse pattern and
>>>>>>>>>> StreamingApplication interface may be extending Module, it does not
>>>>>>>>>>
>>>>>>>>>> seem
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> buy us much. Do we want to allow certain applications to be reused
>>>>>>>>>> as
>>>>>>>> Modules in other applications or should application package be
>>>>>>>> different
>>>>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>>>> as
>>>>>>>>>
>>>>>>>> part
>>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>>>> providing
>>>>>>>>> all necessary dependencies. Application package on other side must
>>>>>>>>>> include
>>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>>>>
>>>>>>>>>> Module
>>>>>>>> maintenance and how Modules are used. What if Module designer wants
>>>>>>>> to
>>>>>>>> change it's implementation and replace one operator implementation
>>>>>>>>
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>>>
>>>>>>>>> internal
>>>>>>>>>
>>>>>>>>> structure of Modules? Should Module be considered as a black box
>>>>>>>>>> during
>>>>>>>> Application design time as it was initially planned?
>>>>>>>>
>>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>>> functionality
>>>>>>>> the way I understand it.
>>>>>>>>
>>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>>> partitioning.
>>>>>>>> One
>>>>>>>> of supported cases should be ability to specify cascading
>>>>>>>>> partitioning
>>>>>>>>>
>>>>>>>> scheme.
>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>> Vlad
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>>>
>>>>>>>>>> I have some comments and suggestions on the module design. I think
>>>>>>>>>> these
>>>>>>>>>>
>>>>>>>>> need to be taken into account before we can merge the
>>>>>>>>>
>>>>>>>>>> implementation
>>>>>>>>>>
>>>>>>>>> provided below into the mainline code. I apologize if these should
>>>>>>>> have
>>>>>>>>> been brought up earlier as for some reason or the other I was out
>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>>>
>>>>>>>>> loop
>>>>>>>> on this one
>>>>>>>>>>>         https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>>>> <
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>>
>>>>>>         1. DAG scoping currently in the implementation is global for
>>>>>>>> modules,
>>>>>>>>>> each module's populateDAG sees the entire DAG. It should be
>>>>>>>>>> locally
>>>>>>>>>>
>>>>>>>>> scoped
>>>>>>>> as one module does not and should not know about another.
>>>>>>>>>>>         2. The module has a populateDAG method with exact same
>>>>>>>>>>> syntax
>>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>> in
>>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>>>> should
>>>>>>>>>>
>>>>>>>>> it
>>>>>>> extend that interface.
>>>>>>>>>         3. Setting properties for modules is too verbose. Module
>>>>>>>>>>> developer
>>>>>>>>> needs to repeat every property they want exposed with a setter and
>>>>>>>>>
>>>>>>>>>> getter
>>>>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>>>>>>>> choose
>>>>>>>>>>
>>>>>>>>> which properties from which operators need to be exposed but the
>>>>>>>>>
>>>>>>>>>> current
>>>>>>>>>>
>>>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>>
>>>>>>>>>>              a. Allow modules to specify which operators and
>>>>>>>>>>> properties
>>>>>>>>> can
>>>>>>>> be
>>>>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>>>>>>>>>> the
>>>>>>>>> module when adding the operator have the ability to specify if
>>>>>>>>> this
>>>>>>>>> operator can be accessible from outside and which or all
>>>>>>>> properties
>>>>>>>>> can
>>>>>>>> be
>>>>>>>>> accessible.
>>>>>>>>>>>              b. Provide methods in ModuleMeta or elsewhere to set
>>>>>>>>>>>
>>>>>>>>>>> property
>>>>>>>>> values by specifying the operator name (friendly name) inside the
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>
>>>>>>>>> and property name. If this is allowed by a. above it is successful
>>>>>>>>>
>>>>>>>>>> else
>>>>>>>>>>
>>>>>>>>> it
>>>>>>>>>
>>>>>>>>> should fail.
>>>>>>>>>>>              c. Allow a syntax in property files to specify the
>>>>>>>>>>>
>>>>>>>>>>> property
>>>>>>>>> in
>>>>>>>> b.
>>>>>>>>>> Example syntax
>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>>>> propname>
>>>>>>>>         4. For attributes same mechanism as in 3 should apply for the
>>>>>>>>>>> operators
>>>>>>>>>>> that are exposed by the module.  For property file, example syntax
>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>>>         5. Module developers in addition to 3. and 4. above may
>>>>>>>>>>>
>>>>>>>>>>> choose
>>>>>>>>> to
>>>>>>> support module level properties and attributes. These should not
>>>>>>>>> be
>>>>>>>>> the
>>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> them.
>>>>>>>>> In this case for properties they can implement setters and getters
>>>>>>>>>>> in
>>>>>>>>> the
>>>>>>>>> module. For attributes the user should still be able to set the
>>>>>>>>>> attributes
>>>>>>>>>> using the dag setAttribute method. You could introduce a method in
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>> module to process attributes that can get called by the engine
>>>>>>>>> once
>>>>>>>>> everything is set.
>>>>>>>>         6. For 5. above setting global properties and attributes for
>>>>>>>>>>> module
>>>>>>>>> is
>>>>>>>>>
>>>>>>>>>> akin to ideas that have been proposed for the application as
>>>>>>>>>>> well. A
>>>>>>>>> consistent way must be possible for applications as well even if
>>>>>>>> it
>>>>>>>>> is
>>>>>>> not
>>>>>>>>> implemented now.
>>>>>>>>>>>         7. For 5. or 6. above there should be a property file way
>>>>>>>>>>> of
>>>>>>>>>>> specifying
>>>>>>>>>>> the global module properties and attributes. Example syntax
>>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>>>>>>>>>>
>>>>>>>>>>> operator
>>>>>>>>> keyword here.
>>>>>>>>>         8. Partitioning needs to be consistent with what the user
>>>>>>>>>>> will
>>>>>>>>> expect
>>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>>>> possible
>>>>>>>>>>
>>>>>>>>> examples of how the user will expect the physical plan to look in
>>>>>>>> certain
>>>>>>>>>> cases.
>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>

Re: Modules

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Depends if the operator and module are container local or not.

On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v....@datatorrent.com> wrote:

> I think that just build tool will not provide necessary functionality for
> the second option. Module will have to implement it's own class loader
> otherwise behavior will be undefined in case where application embeds
> module and independently different version of an operator included into the
> module.
>
> Thank you,
>
> Vlad
>
>
> On 11/3/15 10:10, Thomas Weise wrote:
>
>> Both are valid options and to be solved through the build tool. In one
>> case
>> the dependency is declared, in the latter embedded.
>>
>> Thomas
>>
>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com>
>> wrote:
>>
>> This brings one more issue that we did not cover in the module design
>>> discussions.  What happens when new versions of operators embedded into a
>>> module become available? I believe we all go with the assumption that
>>> module will pickup version available on the classpath at run-time and it
>>> is
>>> operator developer responsibility to provide full binary compatibility.
>>> Another possible behavior is to consider module as completely independent
>>> unit and package all necessary libraries along with the module.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>>
>>> I am not suggesting we leak the internals or compromise access modifiers.
>>>> I
>>>> want the module developer to have the ability (not mandatory) to make
>>>> available all or a subset of the properties of an operator easily if
>>>> they
>>>> desire without having to create setter/getter for each of them. You
>>>> don't
>>>> have to expose the operator they belong to. My preference would also be
>>>> to
>>>> preserve the namespace of the properties in some way for example by
>>>> grouping them by operator name. Think about scenario where people have
>>>> built modules using kafka input operator and there is a new kafka
>>>> connection property. Without having this ability the modules have to be
>>>> changed to support this property. With this feature the module
>>>> developers
>>>> have a choice whether to keep the list of kafka properties fixed in the
>>>> module or allow new properties.
>>>>
>>>> Thanks
>>>>
>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
>>>> wrote:
>>>>
>>>> There is also the option to inherit common module properties through a
>>>>
>>>>> base
>>>>> class.
>>>>>
>>>>> I don't see how this is any different from an operator. The developer
>>>>> decides what gets exposed and has the same options to control it.
>>>>>
>>>>> Encapsulation is good practice, by leaking the module internals the
>>>>> using
>>>>> code becomes brittle.
>>>>>
>>>>> Thomas
>>>>>
>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>> The same goes for a Java or C++ class that changes its api. In general
>>>>> this
>>>>>
>>>>> is left to the developer, and these languages have internals as private
>>>>>>
>>>>>> by
>>>>>
>>>>> default for precisely the same purpose. The module developer must have
>>>>>>
>>>>>> the
>>>>>
>>>>> right to change internals, keep api clean/constant and expect user code
>>>>>>
>>>>>> to
>>>>>
>>>>> not break.
>>>>>>
>>>>>> Thks,
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
>>>>>> pramod@datatorrent.com
>>>>>> wrote:
>>>>>>
>>>>>> For 3 and 4 can't we strike a balance between not having to expose the
>>>>>>
>>>>>>> operators underneath and at the same time not having to write
>>>>>>>
>>>>>>> boilerplate
>>>>>> code for all the properties that the module wants to make available
>>>>>>
>>>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>>>
>>>>>>> operator
>>>>>>
>>>>>> has a new connection property which can be used outside and now all
>>>>>>> the
>>>>>>> modules using that operator, their code has to be modified to just
>>>>>>> add
>>>>>>>
>>>>>>> a
>>>>>> pass through setter/getter. How about treating the operator name as a
>>>>>> group
>>>>>>
>>>>>> name and ability for module developers to easily make
>>>>>>> available/specify
>>>>>>>
>>>>>>> all
>>>>>>
>>>>>> or a subset of the properties of an operator to the user without
>>>>>>> having
>>>>>>>
>>>>>>> to
>>>>>>
>>>>>> explicitly make each of them a module property.
>>>>>>>
>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>>>>>>>
>>>>>>> wrote:
>>>>>> 3,4 should follow conventions where the creator decides the api
>>>>>>
>>>>>>> (including
>>>>>>>
>>>>>>> accessibility). In general only those properties exposed by module
>>>>>>>>
>>>>>>>> creator
>>>>>>>
>>>>>>> should be settable. What the module internally does with them is
>>>>>>>>
>>>>>>>> module
>>>>>>>
>>>>>> designer's call. Accessing internals of module from outside is
>>>>>>
>>>>>>> uncommon.
>>>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>>> accessed.
>>>>>>>
>>>>>>> Properties (setter and getter) are the api that module designer gives
>>>>>>>>
>>>>>>>> to
>>>>>>> the module user. It is dangerous and has unintended consequences if
>>>>>>> module
>>>>>>>
>>>>>>> user starts to access internals outside the api.
>>>>>>>>
>>>>>>>> Partitioning should be next phase. As long as current design does
>>>>>>>> not
>>>>>>>>
>>>>>>>> halt
>>>>>>>
>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>>
>>>>>>>> Thks,
>>>>>>>> Amol
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v.rozov@datatorrent.com
>>>>>>>> >
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>>>>>>> not
>>>>>>>>
>>>>>>> by
>>>>>>>
>>>>>>> design and is the current pull request implementation details.
>>>>>>>>
>>>>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>>>>> module/application designer to expose DAG design reuse pattern and
>>>>>>>>> StreamingApplication interface may be extending Module, it does not
>>>>>>>>>
>>>>>>>>> seem
>>>>>>>> to
>>>>>>>>
>>>>>>>> buy us much. Do we want to allow certain applications to be reused
>>>>>>>>>
>>>>>>>>> as
>>>>>>>>
>>>>>>> Modules in other applications or should application package be
>>>>>>
>>>>>>> different
>>>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>>> as
>>>>>>>>
>>>>>>> part
>>>>>>
>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>>>
>>>>>>>>> providing
>>>>>>>>
>>>>>>>> all necessary dependencies. Application package on other side must
>>>>>>>>>
>>>>>>>>> include
>>>>>>>>
>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>>
>>>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>>>
>>>>>>>>> Module
>>>>>>>>
>>>>>>> maintenance and how Modules are used. What if Module designer wants
>>>>>>
>>>>>>> to
>>>>>>>>
>>>>>>> change it's implementation and replace one operator implementation
>>>>>>>
>>>>>>>> with
>>>>>>>>
>>>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>>
>>>>>>>> internal
>>>>>>>>
>>>>>>>> structure of Modules? Should Module be considered as a black box
>>>>>>>>>
>>>>>>>>> during
>>>>>>>>
>>>>>>> Application design time as it was initially planned?
>>>>>>>
>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>>
>>>>>>>>> functionality
>>>>>>>>
>>>>>>> the way I understand it.
>>>>>>>
>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>>
>>>>>>>>> partitioning.
>>>>>>>>
>>>>>>> One
>>>>>>
>>>>>>> of supported cases should be ability to specify cascading
>>>>>>>> partitioning
>>>>>>>>
>>>>>>> scheme.
>>>>>>>
>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Vlad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>>
>>>>>>>>> I have some comments and suggestions on the module design. I think
>>>>>>>>> these
>>>>>>>>>
>>>>>>>> need to be taken into account before we can merge the
>>>>>>>>
>>>>>>>>> implementation
>>>>>>>>>
>>>>>>>> provided below into the mainline code. I apologize if these should
>>>>>>
>>>>>>> have
>>>>>>>>>
>>>>>>>> been brought up earlier as for some reason or the other I was out
>>>>>>>>
>>>>>>>>> of
>>>>>>>>>
>>>>>>>> loop
>>>>>>
>>>>>>> on this one
>>>>>>>>>
>>>>>>>>>>        https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>
>>>>>        1. DAG scoping currently in the implementation is global for
>>>>>>
>>>>>>> modules,
>>>>>>>>> each module's populateDAG sees the entire DAG. It should be
>>>>>>>>> locally
>>>>>>>>>
>>>>>>>> scoped
>>>>>>
>>>>>>> as one module does not and should not know about another.
>>>>>>>>>
>>>>>>>>>>        2. The module has a populateDAG method with exact same
>>>>>>>>>> syntax
>>>>>>>>>>
>>>>>>>>>> as
>>>>>>>>>
>>>>>>>> in
>>>>>>>
>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>>
>>>>>>>>> should
>>>>>>>>>
>>>>>>>> it
>>>>>>
>>>>>> extend that interface.
>>>>>>>
>>>>>>>>        3. Setting properties for modules is too verbose. Module
>>>>>>>>>>
>>>>>>>>>> developer
>>>>>>>>>
>>>>>>>> needs to repeat every property they want exposed with a setter and
>>>>>>>>
>>>>>>>>> getter
>>>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>>>>>>> choose
>>>>>>>>>
>>>>>>>> which properties from which operators need to be exposed but the
>>>>>>>>
>>>>>>>>> current
>>>>>>>>>
>>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>
>>>>>>>>>             a. Allow modules to specify which operators and
>>>>>>>>>>
>>>>>>>>>> properties
>>>>>>>>>
>>>>>>>> can
>>>>>>
>>>>>>> be
>>>>>>>>
>>>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>> module when adding the operator have the ability to specify if
>>>>>>>
>>>>>>>> this
>>>>>>>>>
>>>>>>>> operator can be accessible from outside and which or all
>>>>>>
>>>>>>> properties
>>>>>>>>>
>>>>>>>> can
>>>>>>
>>>>>>> be
>>>>>>>>
>>>>>>>> accessible.
>>>>>>>>>
>>>>>>>>>>             b. Provide methods in ModuleMeta or elsewhere to set
>>>>>>>>>>
>>>>>>>>>> property
>>>>>>>>>
>>>>>>>> values by specifying the operator name (friendly name) inside the
>>>>>>>>
>>>>>>>>> module
>>>>>>>>>
>>>>>>>> and property name. If this is allowed by a. above it is successful
>>>>>>>>
>>>>>>>>> else
>>>>>>>>>
>>>>>>>> it
>>>>>>>>
>>>>>>>> should fail.
>>>>>>>>>
>>>>>>>>>>             c. Allow a syntax in property files to specify the
>>>>>>>>>>
>>>>>>>>>> property
>>>>>>>>>
>>>>>>>> in
>>>>>>>
>>>>>>> b.
>>>>>>>>
>>>>>>>>> Example syntax
>>>>>>>>>>
>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>>>>
>>>>>>>> propname>
>>>>>>
>>>>>>>        4. For attributes same mechanism as in 3 should apply for the
>>>>>>>>>> operators
>>>>>>>>>> that are exposed by the module.  For property file, example syntax
>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>>        5. Module developers in addition to 3. and 4. above may
>>>>>>>>>>
>>>>>>>>>> choose
>>>>>>>>>
>>>>>>>> to
>>>>>>
>>>>>> support module level properties and attributes. These should not
>>>>>>>
>>>>>>>> be
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> them.
>>>>>>>
>>>>>>>> In this case for properties they can implement setters and getters
>>>>>>>>>>
>>>>>>>>>> in
>>>>>>>>>
>>>>>>>> the
>>>>>>>
>>>>>>>> module. For attributes the user should still be able to set the
>>>>>>>>> attributes
>>>>>>>>> using the dag setAttribute method. You could introduce a method in
>>>>>>>>> the
>>>>>>>>>
>>>>>>>> module to process attributes that can get called by the engine
>>>>>>>
>>>>>>>> once
>>>>>>>>>
>>>>>>>> everything is set.
>>>>>>
>>>>>>>        6. For 5. above setting global properties and attributes for
>>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>
>>>>>>>> is
>>>>>>>>
>>>>>>>>> akin to ideas that have been proposed for the application as
>>>>>>>>>>
>>>>>>>>>> well. A
>>>>>>>>>
>>>>>>>> consistent way must be possible for applications as well even if
>>>>>>
>>>>>>> it
>>>>>>>>>
>>>>>>>> is
>>>>>>
>>>>>> not
>>>>>>>
>>>>>>>> implemented now.
>>>>>>>>>
>>>>>>>>>>        7. For 5. or 6. above there should be a property file way
>>>>>>>>>> of
>>>>>>>>>> specifying
>>>>>>>>>> the global module properties and attributes. Example syntax
>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>>>>>>>>>
>>>>>>>>>> operator
>>>>>>>>>
>>>>>>>> keyword here.
>>>>>>>
>>>>>>>>        8. Partitioning needs to be consistent with what the user
>>>>>>>>>>
>>>>>>>>>> will
>>>>>>>>>
>>>>>>>> expect
>>>>>>
>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>>> possible
>>>>>>>>>
>>>>>>>> examples of how the user will expect the physical plan to look in
>>>>>>
>>>>>>> certain
>>>>>>>>> cases.
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>

Re: Modules

Posted by Vlad Rozov <v....@datatorrent.com>.

I think that just build tool will not provide necessary functionality 
for the second option. Module will have to implement it's own class 
loader otherwise behavior will be undefined in case where application 
embeds module and independently different version of an operator 
included into the module.

Thank you,

Vlad

On 11/3/15 10:10, Thomas Weise wrote:
> Both are valid options and to be solved through the build tool. In one case
> the dependency is declared, in the latter embedded.
>
> Thomas
>
> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com> wrote:
>
>> This brings one more issue that we did not cover in the module design
>> discussions.  What happens when new versions of operators embedded into a
>> module become available? I believe we all go with the assumption that
>> module will pickup version available on the classpath at run-time and it is
>> operator developer responsibility to provide full binary compatibility.
>> Another possible behavior is to consider module as completely independent
>> unit and package all necessary libraries along with the module.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>
>>> I am not suggesting we leak the internals or compromise access modifiers.
>>> I
>>> want the module developer to have the ability (not mandatory) to make
>>> available all or a subset of the properties of an operator easily if they
>>> desire without having to create setter/getter for each of them. You don't
>>> have to expose the operator they belong to. My preference would also be to
>>> preserve the namespace of the properties in some way for example by
>>> grouping them by operator name. Think about scenario where people have
>>> built modules using kafka input operator and there is a new kafka
>>> connection property. Without having this ability the modules have to be
>>> changed to support this property. With this feature the module developers
>>> have a choice whether to keep the list of kafka properties fixed in the
>>> module or allow new properties.
>>>
>>> Thanks
>>>
>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
>>> wrote:
>>>
>>> There is also the option to inherit common module properties through a
>>>> base
>>>> class.
>>>>
>>>> I don't see how this is any different from an operator. The developer
>>>> decides what gets exposed and has the same options to control it.
>>>>
>>>> Encapsulation is good practice, by leaking the module internals the using
>>>> code becomes brittle.
>>>>
>>>> Thomas
>>>>
>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com> wrote:
>>>>
>>>> The same goes for a Java or C++ class that changes its api. In general
>>>> this
>>>>
>>>>> is left to the developer, and these languages have internals as private
>>>>>
>>>> by
>>>>
>>>>> default for precisely the same purpose. The module developer must have
>>>>>
>>>> the
>>>>
>>>>> right to change internals, keep api clean/constant and expect user code
>>>>>
>>>> to
>>>>
>>>>> not break.
>>>>>
>>>>> Thks,
>>>>> Amol
>>>>>
>>>>>
>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <pramod@datatorrent.com
>>>>> wrote:
>>>>>
>>>>> For 3 and 4 can't we strike a balance between not having to expose the
>>>>>> operators underneath and at the same time not having to write
>>>>>>
>>>>> boilerplate
>>>>> code for all the properties that the module wants to make available
>>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>>
>>>>> operator
>>>>>
>>>>>> has a new connection property which can be used outside and now all the
>>>>>> modules using that operator, their code has to be modified to just add
>>>>>>
>>>>> a
>>>>> pass through setter/getter. How about treating the operator name as a
>>>>> group
>>>>>
>>>>>> name and ability for module developers to easily make available/specify
>>>>>>
>>>>> all
>>>>>
>>>>>> or a subset of the properties of an operator to the user without having
>>>>>>
>>>>> to
>>>>>
>>>>>> explicitly make each of them a module property.
>>>>>>
>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>>>>>>
>>>>> wrote:
>>>>> 3,4 should follow conventions where the creator decides the api
>>>>>> (including
>>>>>>
>>>>>>> accessibility). In general only those properties exposed by module
>>>>>>>
>>>>>> creator
>>>>>>
>>>>>>> should be settable. What the module internally does with them is
>>>>>>>
>>>>>> module
>>>>> designer's call. Accessing internals of module from outside is
>>>>>> uncommon.
>>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>> accessed.
>>>>>>
>>>>>>> Properties (setter and getter) are the api that module designer gives
>>>>>>>
>>>>>> to
>>>>>> the module user. It is dangerous and has unintended consequences if
>>>>>> module
>>>>>>
>>>>>>> user starts to access internals outside the api.
>>>>>>>
>>>>>>> Partitioning should be next phase. As long as current design does not
>>>>>>>
>>>>>> halt
>>>>>>
>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>
>>>>>>> Thks,
>>>>>>> Amol
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v....@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>>>>>> not
>>>>>> by
>>>>>>
>>>>>>> design and is the current pull request implementation details.
>>>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>>>> module/application designer to expose DAG design reuse pattern and
>>>>>>>> StreamingApplication interface may be extending Module, it does not
>>>>>>>>
>>>>>>> seem
>>>>>>> to
>>>>>>>
>>>>>>>> buy us much. Do we want to allow certain applications to be reused
>>>>>>>>
>>>>>>> as
>>>>> Modules in other applications or should application package be
>>>>>>> different
>>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>> as
>>>>> part
>>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>>
>>>>>>> providing
>>>>>>>
>>>>>>>> all necessary dependencies. Application package on other side must
>>>>>>>>
>>>>>>> include
>>>>>>>
>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>
>>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>>
>>>>>>> Module
>>>>> maintenance and how Modules are used. What if Module designer wants
>>>>>>> to
>>>>>> change it's implementation and replace one operator implementation
>>>>>>> with
>>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>> internal
>>>>>>>
>>>>>>>> structure of Modules? Should Module be considered as a black box
>>>>>>>>
>>>>>>> during
>>>>>> Application design time as it was initially planned?
>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>
>>>>>>> functionality
>>>>>> the way I understand it.
>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>
>>>>>>> partitioning.
>>>>> One
>>>>>>> of supported cases should be ability to specify cascading
>>>>>>> partitioning
>>>>>> scheme.
>>>>>>>> Thank you,
>>>>>>>>
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>
>>>>>>>> I have some comments and suggestions on the module design. I think
>>>>>>>> these
>>>>>>> need to be taken into account before we can merge the
>>>>>>>> implementation
>>>>> provided below into the mainline code. I apologize if these should
>>>>>>>> have
>>>>>>> been brought up earlier as for some reason or the other I was out
>>>>>>>> of
>>>>> loop
>>>>>>>> on this one
>>>>>>>>>        https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>> <
>>>>>>>>>
>>>>>>>>>
>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>
>>>>>        1. DAG scoping currently in the implementation is global for
>>>>>>>> modules,
>>>>>>>> each module's populateDAG sees the entire DAG. It should be
>>>>>>>> locally
>>>>> scoped
>>>>>>>> as one module does not and should not know about another.
>>>>>>>>>        2. The module has a populateDAG method with exact same syntax
>>>>>>>>>
>>>>>>>> as
>>>>>> in
>>>>>>
>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>> should
>>>>> it
>>>>>
>>>>>> extend that interface.
>>>>>>>>>        3. Setting properties for modules is too verbose. Module
>>>>>>>>>
>>>>>>>> developer
>>>>>>> needs to repeat every property they want exposed with a setter and
>>>>>>>> getter
>>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>>>>>> choose
>>>>>>> which properties from which operators need to be exposed but the
>>>>>>>> current
>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>>             a. Allow modules to specify which operators and
>>>>>>>>>
>>>>>>>> properties
>>>>> can
>>>>>>> be
>>>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>>>>>>>>
>>>>>>>> the
>>>>>> module when adding the operator have the ability to specify if
>>>>>>>> this
>>>>> operator can be accessible from outside and which or all
>>>>>>>> properties
>>>>> can
>>>>>>> be
>>>>>>>
>>>>>>>> accessible.
>>>>>>>>>             b. Provide methods in ModuleMeta or elsewhere to set
>>>>>>>>>
>>>>>>>> property
>>>>>>> values by specifying the operator name (friendly name) inside the
>>>>>>>> module
>>>>>>> and property name. If this is allowed by a. above it is successful
>>>>>>>> else
>>>>>>> it
>>>>>>>
>>>>>>>> should fail.
>>>>>>>>>             c. Allow a syntax in property files to specify the
>>>>>>>>>
>>>>>>>> property
>>>>>> in
>>>>>>
>>>>>>> b.
>>>>>>>>> Example syntax
>>>>>>>>>
>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>> propname>
>>>>>>>>>        4. For attributes same mechanism as in 3 should apply for the
>>>>>>>>> operators
>>>>>>>>> that are exposed by the module.  For property file, example syntax
>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>        5. Module developers in addition to 3. and 4. above may
>>>>>>>>>
>>>>>>>> choose
>>>>> to
>>>>>
>>>>>> support module level properties and attributes. These should not
>>>>>>>> be
>>>>> the
>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>> to
>>>>>> them.
>>>>>>>>> In this case for properties they can implement setters and getters
>>>>>>>>>
>>>>>>>> in
>>>>>> the
>>>>>>>> module. For attributes the user should still be able to set the
>>>>>>>> attributes
>>>>>>>> using the dag setAttribute method. You could introduce a method in
>>>>>>>> the
>>>>>> module to process attributes that can get called by the engine
>>>>>>>> once
>>>>> everything is set.
>>>>>>>>>        6. For 5. above setting global properties and attributes for
>>>>>>>>>
>>>>>>>> module
>>>>>>> is
>>>>>>>>> akin to ideas that have been proposed for the application as
>>>>>>>>>
>>>>>>>> well. A
>>>>> consistent way must be possible for applications as well even if
>>>>>>>> it
>>>>> is
>>>>>
>>>>>> not
>>>>>>>> implemented now.
>>>>>>>>>        7. For 5. or 6. above there should be a property file way of
>>>>>>>>> specifying
>>>>>>>>> the global module properties and attributes. Example syntax
>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>>>>>>>>
>>>>>>>> operator
>>>>>> keyword here.
>>>>>>>>>        8. Partitioning needs to be consistent with what the user
>>>>>>>>>
>>>>>>>> will
>>>>> expect
>>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>> possible
>>>>> examples of how the user will expect the physical plan to look in
>>>>>>>> certain
>>>>>>>> cases.
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Re: Modules

Posted by Thomas Weise <th...@datatorrent.com>.

Both are valid options and to be solved through the build tool. In one case
the dependency is declared, in the latter embedded.

Thomas

On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v....@datatorrent.com> wrote:

> This brings one more issue that we did not cover in the module design
> discussions.  What happens when new versions of operators embedded into a
> module become available? I believe we all go with the assumption that
> module will pickup version available on the classpath at run-time and it is
> operator developer responsibility to provide full binary compatibility.
> Another possible behavior is to consider module as completely independent
> unit and package all necessary libraries along with the module.
>
> Thank you,
>
> Vlad
>
>
> On 11/3/15 09:40, Pramod Immaneni wrote:
>
>> I am not suggesting we leak the internals or compromise access modifiers.
>> I
>> want the module developer to have the ability (not mandatory) to make
>> available all or a subset of the properties of an operator easily if they
>> desire without having to create setter/getter for each of them. You don't
>> have to expose the operator they belong to. My preference would also be to
>> preserve the namespace of the properties in some way for example by
>> grouping them by operator name. Think about scenario where people have
>> built modules using kafka input operator and there is a new kafka
>> connection property. Without having this ability the modules have to be
>> changed to support this property. With this feature the module developers
>> have a choice whether to keep the list of kafka properties fixed in the
>> module or allow new properties.
>>
>> Thanks
>>
>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
>> wrote:
>>
>> There is also the option to inherit common module properties through a
>>> base
>>> class.
>>>
>>> I don't see how this is any different from an operator. The developer
>>> decides what gets exposed and has the same options to control it.
>>>
>>> Encapsulation is good practice, by leaking the module internals the using
>>> code becomes brittle.
>>>
>>> Thomas
>>>
>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com> wrote:
>>>
>>> The same goes for a Java or C++ class that changes its api. In general
>>>>
>>> this
>>>
>>>> is left to the developer, and these languages have internals as private
>>>>
>>> by
>>>
>>>> default for precisely the same purpose. The module developer must have
>>>>
>>> the
>>>
>>>> right to change internals, keep api clean/constant and expect user code
>>>>
>>> to
>>>
>>>> not break.
>>>>
>>>> Thks,
>>>> Amol
>>>>
>>>>
>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <pramod@datatorrent.com
>>>> >
>>>> wrote:
>>>>
>>>> For 3 and 4 can't we strike a balance between not having to expose the
>>>>> operators underneath and at the same time not having to write
>>>>>
>>>> boilerplate
>>>
>>>> code for all the properties that the module wants to make available
>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>
>>>> operator
>>>>
>>>>> has a new connection property which can be used outside and now all the
>>>>> modules using that operator, their code has to be modified to just add
>>>>>
>>>> a
>>>
>>>> pass through setter/getter. How about treating the operator name as a
>>>>>
>>>> group
>>>>
>>>>> name and ability for module developers to easily make available/specify
>>>>>
>>>> all
>>>>
>>>>> or a subset of the properties of an operator to the user without having
>>>>>
>>>> to
>>>>
>>>>> explicitly make each of them a module property.
>>>>>
>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>>>>>
>>>> wrote:
>>>
>>>> 3,4 should follow conventions where the creator decides the api
>>>>>>
>>>>> (including
>>>>>
>>>>>> accessibility). In general only those properties exposed by module
>>>>>>
>>>>> creator
>>>>>
>>>>>> should be settable. What the module internally does with them is
>>>>>>
>>>>> module
>>>
>>>> designer's call. Accessing internals of module from outside is
>>>>>>
>>>>> uncommon.
>>>>
>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>>
>>>>> accessed.
>>>>>
>>>>>> Properties (setter and getter) are the api that module designer gives
>>>>>>
>>>>> to
>>>>
>>>>> the module user. It is dangerous and has unintended consequences if
>>>>>>
>>>>> module
>>>>>
>>>>>> user starts to access internals outside the api.
>>>>>>
>>>>>> Partitioning should be next phase. As long as current design does not
>>>>>>
>>>>> halt
>>>>>
>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>
>>>>>> Thks,
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v....@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>>>>>>
>>>>>> not
>>>>
>>>>> by
>>>>>
>>>>>> design and is the current pull request implementation details.
>>>>>>>
>>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>>> module/application designer to expose DAG design reuse pattern and
>>>>>>> StreamingApplication interface may be extending Module, it does not
>>>>>>>
>>>>>> seem
>>>>>
>>>>>> to
>>>>>>
>>>>>>> buy us much. Do we want to allow certain applications to be reused
>>>>>>>
>>>>>> as
>>>
>>>> Modules in other applications or should application package be
>>>>>>>
>>>>>> different
>>>>>
>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>>
>>>>>> as
>>>
>>>> part
>>>>>>
>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>
>>>>>> providing
>>>>>>
>>>>>>> all necessary dependencies. Application package on other side must
>>>>>>>
>>>>>> include
>>>>>>
>>>>>>> all dependencies not provided by the platform.
>>>>>>>
>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>
>>>>>> Module
>>>
>>>> maintenance and how Modules are used. What if Module designer wants
>>>>>>>
>>>>>> to
>>>>
>>>>> change it's implementation and replace one operator implementation
>>>>>>>
>>>>>> with
>>>>
>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>>
>>>>>> internal
>>>>>>
>>>>>>> structure of Modules? Should Module be considered as a black box
>>>>>>>
>>>>>> during
>>>>
>>>>> Application design time as it was initially planned?
>>>>>>>
>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>
>>>>>> functionality
>>>>
>>>>> the way I understand it.
>>>>>>>
>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>
>>>>>> partitioning.
>>>
>>>> One
>>>>>
>>>>>> of supported cases should be ability to specify cascading
>>>>>>>
>>>>>> partitioning
>>>>
>>>>> scheme.
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Vlad
>>>>>>>
>>>>>>>
>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>
>>>>>>> I have some comments and suggestions on the module design. I think
>>>>>>>>
>>>>>>> these
>>>>>
>>>>>> need to be taken into account before we can merge the
>>>>>>>>
>>>>>>> implementation
>>>
>>>> provided below into the mainline code. I apologize if these should
>>>>>>>>
>>>>>>> have
>>>>>
>>>>>> been brought up earlier as for some reason or the other I was out
>>>>>>>>
>>>>>>> of
>>>
>>>> loop
>>>>>>
>>>>>>> on this one
>>>>>>>>
>>>>>>>>       https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>> <
>>>>>>>>
>>>>>>>>
>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>
>>>>       1. DAG scoping currently in the implementation is global for
>>>>>>>>
>>>>>>> modules,
>>>>>>
>>>>>>> each module's populateDAG sees the entire DAG. It should be
>>>>>>>>
>>>>>>> locally
>>>
>>>> scoped
>>>>>>
>>>>>>> as one module does not and should not know about another.
>>>>>>>>       2. The module has a populateDAG method with exact same syntax
>>>>>>>>
>>>>>>> as
>>>>
>>>>> in
>>>>>
>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>>
>>>>>>> should
>>>
>>>> it
>>>>
>>>>> extend that interface.
>>>>>>>>       3. Setting properties for modules is too verbose. Module
>>>>>>>>
>>>>>>> developer
>>>>>
>>>>>> needs to repeat every property they want exposed with a setter and
>>>>>>>>
>>>>>>> getter
>>>>>>
>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>>>>>>
>>>>>>> choose
>>>>>
>>>>>> which properties from which operators need to be exposed but the
>>>>>>>>
>>>>>>> current
>>>>>
>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>            a. Allow modules to specify which operators and
>>>>>>>>
>>>>>>> properties
>>>
>>>> can
>>>>>
>>>>>> be
>>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>>>>>>>
>>>>>>> the
>>>>
>>>>> module when adding the operator have the ability to specify if
>>>>>>>>
>>>>>>> this
>>>
>>>> operator can be accessible from outside and which or all
>>>>>>>>
>>>>>>> properties
>>>
>>>> can
>>>>>
>>>>>> be
>>>>>>
>>>>>>> accessible.
>>>>>>>>            b. Provide methods in ModuleMeta or elsewhere to set
>>>>>>>>
>>>>>>> property
>>>>>
>>>>>> values by specifying the operator name (friendly name) inside the
>>>>>>>>
>>>>>>> module
>>>>>
>>>>>> and property name. If this is allowed by a. above it is successful
>>>>>>>>
>>>>>>> else
>>>>>
>>>>>> it
>>>>>>
>>>>>>> should fail.
>>>>>>>>            c. Allow a syntax in property files to specify the
>>>>>>>>
>>>>>>> property
>>>>
>>>>> in
>>>>>
>>>>>> b.
>>>>>>>> Example syntax
>>>>>>>>
>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>
>>>> propname>
>>>>>>>>       4. For attributes same mechanism as in 3 should apply for the
>>>>>>>> operators
>>>>>>>> that are exposed by the module.  For property file, example syntax
>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>       5. Module developers in addition to 3. and 4. above may
>>>>>>>>
>>>>>>> choose
>>>
>>>> to
>>>>
>>>>> support module level properties and attributes. These should not
>>>>>>>>
>>>>>>> be
>>>
>>>> the
>>>>>
>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>>
>>>>>>> to
>>>>
>>>>> them.
>>>>>>>> In this case for properties they can implement setters and getters
>>>>>>>>
>>>>>>> in
>>>>
>>>>> the
>>>>>>
>>>>>>> module. For attributes the user should still be able to set the
>>>>>>>>
>>>>>>> attributes
>>>>>>
>>>>>>> using the dag setAttribute method. You could introduce a method in
>>>>>>>>
>>>>>>> the
>>>>
>>>>> module to process attributes that can get called by the engine
>>>>>>>>
>>>>>>> once
>>>
>>>> everything is set.
>>>>>>>>       6. For 5. above setting global properties and attributes for
>>>>>>>>
>>>>>>> module
>>>>>
>>>>>> is
>>>>>>>> akin to ideas that have been proposed for the application as
>>>>>>>>
>>>>>>> well. A
>>>
>>>> consistent way must be possible for applications as well even if
>>>>>>>>
>>>>>>> it
>>>
>>>> is
>>>>
>>>>> not
>>>>>>
>>>>>>> implemented now.
>>>>>>>>       7. For 5. or 6. above there should be a property file way of
>>>>>>>> specifying
>>>>>>>> the global module properties and attributes. Example syntax
>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>>>>>>>
>>>>>>> operator
>>>>
>>>>> keyword here.
>>>>>>>>       8. Partitioning needs to be consistent with what the user
>>>>>>>>
>>>>>>> will
>>>
>>>> expect
>>>>>>
>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>>
>>>>>>> possible
>>>
>>>> examples of how the user will expect the physical plan to look in
>>>>>>>>
>>>>>>> certain
>>>>>>
>>>>>>> cases.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>

Re: Modules

Posted by Vlad Rozov <v....@datatorrent.com>.

This brings one more issue that we did not cover in the module design 
discussions.  What happens when new versions of operators embedded into 
a module become available? I believe we all go with the assumption that 
module will pickup version available on the classpath at run-time and it 
is operator developer responsibility to provide full binary 
compatibility. Another possible behavior is to consider module as 
completely independent unit and package all necessary libraries along 
with the module.

Thank you,

Vlad

On 11/3/15 09:40, Pramod Immaneni wrote:
> I am not suggesting we leak the internals or compromise access modifiers. I
> want the module developer to have the ability (not mandatory) to make
> available all or a subset of the properties of an operator easily if they
> desire without having to create setter/getter for each of them. You don't
> have to expose the operator they belong to. My preference would also be to
> preserve the namespace of the properties in some way for example by
> grouping them by operator name. Think about scenario where people have
> built modules using kafka input operator and there is a new kafka
> connection property. Without having this ability the modules have to be
> changed to support this property. With this feature the module developers
> have a choice whether to keep the list of kafka properties fixed in the
> module or allow new properties.
>
> Thanks
>
> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com> wrote:
>
>> There is also the option to inherit common module properties through a base
>> class.
>>
>> I don't see how this is any different from an operator. The developer
>> decides what gets exposed and has the same options to control it.
>>
>> Encapsulation is good practice, by leaking the module internals the using
>> code becomes brittle.
>>
>> Thomas
>>
>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com> wrote:
>>
>>> The same goes for a Java or C++ class that changes its api. In general
>> this
>>> is left to the developer, and these languages have internals as private
>> by
>>> default for precisely the same purpose. The module developer must have
>> the
>>> right to change internals, keep api clean/constant and expect user code
>> to
>>> not break.
>>>
>>> Thks,
>>> Amol
>>>
>>>
>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <pr...@datatorrent.com>
>>> wrote:
>>>
>>>> For 3 and 4 can't we strike a balance between not having to expose the
>>>> operators underneath and at the same time not having to write
>> boilerplate
>>>> code for all the properties that the module wants to make available
>>>> outside. It can quickly become unmanageable. For example, an input
>>> operator
>>>> has a new connection property which can be used outside and now all the
>>>> modules using that operator, their code has to be modified to just add
>> a
>>>> pass through setter/getter. How about treating the operator name as a
>>> group
>>>> name and ability for module developers to easily make available/specify
>>> all
>>>> or a subset of the properties of an operator to the user without having
>>> to
>>>> explicitly make each of them a module property.
>>>>
>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
>> wrote:
>>>>> 3,4 should follow conventions where the creator decides the api
>>>> (including
>>>>> accessibility). In general only those properties exposed by module
>>>> creator
>>>>> should be settable. What the module internally does with them is
>> module
>>>>> designer's call. Accessing internals of module from outside is
>>> uncommon.
>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>> accessed.
>>>>> Properties (setter and getter) are the api that module designer gives
>>> to
>>>>> the module user. It is dangerous and has unintended consequences if
>>>> module
>>>>> user starts to access internals outside the api.
>>>>>
>>>>> Partitioning should be next phase. As long as current design does not
>>>> halt
>>>>> partitioning it should be ok (which I believe is true).
>>>>>
>>>>> Thks,
>>>>> Amol
>>>>>
>>>>>
>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v....@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>>> 1. +1, though passing original DAG to module's populateDAG() it is
>>> not
>>>> by
>>>>>> design and is the current pull request implementation details.
>>>>>>
>>>>>> 2. While I agree that both Module and StreamingApplication let's
>>>>>> module/application designer to expose DAG design reuse pattern and
>>>>>> StreamingApplication interface may be extending Module, it does not
>>>> seem
>>>>> to
>>>>>> buy us much. Do we want to allow certain applications to be reused
>> as
>>>>>> Modules in other applications or should application package be
>>>> different
>>>>>> from Module package? The current approach is to distribute Modules
>> as
>>>>> part
>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>> providing
>>>>>> all necessary dependencies. Application package on other side must
>>>>> include
>>>>>> all dependencies not provided by the platform.
>>>>>>
>>>>>> 3, 4. While this will help Module designer, it may complicate
>> Module
>>>>>> maintenance and how Modules are used. What if Module designer wants
>>> to
>>>>>> change it's implementation and replace one operator implementation
>>> with
>>>>>> another operator? Does StreamingApplication designer need to know
>>>>> internal
>>>>>> structure of Modules? Should Module be considered as a black box
>>> during
>>>>>> Application design time as it was initially planned?
>>>>>>
>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>> functionality
>>>>>> the way I understand it.
>>>>>>
>>>>>> 8. We need to see what Module designer can specify for
>> partitioning.
>>>> One
>>>>>> of supported cases should be ability to specify cascading
>>> partitioning
>>>>>> scheme.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>>
>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>
>>>>>>> I have some comments and suggestions on the module design. I think
>>>> these
>>>>>>> need to be taken into account before we can merge the
>> implementation
>>>>>>> provided below into the mainline code. I apologize if these should
>>>> have
>>>>>>> been brought up earlier as for some reason or the other I was out
>> of
>>>>> loop
>>>>>>> on this one
>>>>>>>
>>>>>>>       https://github.com/apache/incubator-apex-core/pull/148
>>>>>>> <
>>>>>>>
>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>>>       1. DAG scoping currently in the implementation is global for
>>>>> modules,
>>>>>>> each module's populateDAG sees the entire DAG. It should be
>> locally
>>>>> scoped
>>>>>>> as one module does not and should not know about another.
>>>>>>>       2. The module has a populateDAG method with exact same syntax
>>> as
>>>> in
>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>> should
>>> it
>>>>>>> extend that interface.
>>>>>>>       3. Setting properties for modules is too verbose. Module
>>>> developer
>>>>>>> needs to repeat every property they want exposed with a setter and
>>>>> getter
>>>>>>> in JAVA. I don't disagree that module developer should be able to
>>>> choose
>>>>>>> which properties from which operators need to be exposed but the
>>>> current
>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>            a. Allow modules to specify which operators and
>> properties
>>>> can
>>>>>>> be
>>>>>>> accessible from outside. One way is in the "populateDAG" method of
>>> the
>>>>>>> module when adding the operator have the ability to specify if
>> this
>>>>>>> operator can be accessible from outside and which or all
>> properties
>>>> can
>>>>> be
>>>>>>> accessible.
>>>>>>>            b. Provide methods in ModuleMeta or elsewhere to set
>>>> property
>>>>>>> values by specifying the operator name (friendly name) inside the
>>>> module
>>>>>>> and property name. If this is allowed by a. above it is successful
>>>> else
>>>>> it
>>>>>>> should fail.
>>>>>>>            c. Allow a syntax in property files to specify the
>>> property
>>>> in
>>>>>>> b.
>>>>>>> Example syntax
>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>> propname>
>>>>>>>       4. For attributes same mechanism as in 3 should apply for the
>>>>>>> operators
>>>>>>> that are exposed by the module.  For property file, example syntax
>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>       5. Module developers in addition to 3. and 4. above may
>> choose
>>> to
>>>>>>> support module level properties and attributes. These should not
>> be
>>>> the
>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>> to
>>>>>>> them.
>>>>>>> In this case for properties they can implement setters and getters
>>> in
>>>>> the
>>>>>>> module. For attributes the user should still be able to set the
>>>>> attributes
>>>>>>> using the dag setAttribute method. You could introduce a method in
>>> the
>>>>>>> module to process attributes that can get called by the engine
>> once
>>>>>>> everything is set.
>>>>>>>       6. For 5. above setting global properties and attributes for
>>>> module
>>>>>>> is
>>>>>>> akin to ideas that have been proposed for the application as
>> well. A
>>>>>>> consistent way must be possible for applications as well even if
>> it
>>> is
>>>>> not
>>>>>>> implemented now.
>>>>>>>       7. For 5. or 6. above there should be a property file way of
>>>>>>> specifying
>>>>>>> the global module properties and attributes. Example syntax
>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>> Notice the difference with 3. c. and 4 above that there is no
>>> operator
>>>>>>> keyword here.
>>>>>>>       8. Partitioning needs to be consistent with what the user
>> will
>>>>> expect
>>>>>>> when they see module as an entity. I will send an image of
>> possible
>>>>>>> examples of how the user will expect the physical plan to look in
>>>>> certain
>>>>>>> cases.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>

Re: Modules

Posted by Pramod Immaneni <pr...@datatorrent.com>.

My responses inline

On Tue, Nov 3, 2015 at 9:49 AM, Thomas Weise <th...@datatorrent.com> wrote:

> You cannot hide and expose the operator inside the module at the same time.
> By suggesting grouping of properties by operator you are exposing the
> internals of the module.
>

You don't have to expose the operator by having something like a
getOperatorByName but you can have a setProperty(String groupName, String
propName, String propValue) that is supported at the ModuleMeta level or a
setProperty(String propName, String propValue) on the ModuleMeta if you
don't want to have a groupName and put all properties in the same
namespace. This is just an example and there may be better ways to do this
but again this saves the module developer from writing specific code for
every property.

>
> We have a number of use cases where we explicitly don't want that to
> happen.
>
> There seems to be an assumption that every property is a separate
> getter/setter. That's not the case. A property itself can be a complex
> object. You can change that object without changing the module.
>
> BTW that's the case for the Kafka operator connection properties.
>

Agreed kafka was probably not the best example.  There are many other
operators like jdbc for example where properties are not in a complex
object.

>
>
> On Tue, Nov 3, 2015 at 9:40 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
> > I am not suggesting we leak the internals or compromise access
> modifiers. I
> > want the module developer to have the ability (not mandatory) to make
> > available all or a subset of the properties of an operator easily if they
> > desire without having to create setter/getter for each of them. You don't
> > have to expose the operator they belong to. My preference would also be
> to
> > preserve the namespace of the properties in some way for example by
> > grouping them by operator name. Think about scenario where people have
> > built modules using kafka input operator and there is a new kafka
> > connection property. Without having this ability the modules have to be
> > changed to support this property. With this feature the module developers
> > have a choice whether to keep the list of kafka properties fixed in the
> > module or allow new properties.
> >
> > Thanks
> >
> > On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
> > wrote:
> >
> > > There is also the option to inherit common module properties through a
> > base
> > > class.
> > >
> > > I don't see how this is any different from an operator. The developer
> > > decides what gets exposed and has the same options to control it.
> > >
> > > Encapsulation is good practice, by leaking the module internals the
> using
> > > code becomes brittle.
> > >
> > > Thomas
> > >
> > > On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com>
> wrote:
> > >
> > > > The same goes for a Java or C++ class that changes its api. In
> general
> > > this
> > > > is left to the developer, and these languages have internals as
> private
> > > by
> > > > default for precisely the same purpose. The module developer must
> have
> > > the
> > > > right to change internals, keep api clean/constant and expect user
> code
> > > to
> > > > not break.
> > > >
> > > > Thks,
> > > > Amol
> > > >
> > > >
> > > > On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
> > pramod@datatorrent.com>
> > > > wrote:
> > > >
> > > > > For 3 and 4 can't we strike a balance between not having to expose
> > the
> > > > > operators underneath and at the same time not having to write
> > > boilerplate
> > > > > code for all the properties that the module wants to make available
> > > > > outside. It can quickly become unmanageable. For example, an input
> > > > operator
> > > > > has a new connection property which can be used outside and now all
> > the
> > > > > modules using that operator, their code has to be modified to just
> > add
> > > a
> > > > > pass through setter/getter. How about treating the operator name
> as a
> > > > group
> > > > > name and ability for module developers to easily make
> > available/specify
> > > > all
> > > > > or a subset of the properties of an operator to the user without
> > having
> > > > to
> > > > > explicitly make each of them a module property.
> > > > >
> > > > > On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
> > > wrote:
> > > > >
> > > > > > 3,4 should follow conventions where the creator decides the api
> > > > > (including
> > > > > > accessibility). In general only those properties exposed by
> module
> > > > > creator
> > > > > > should be settable. What the module internally does with them is
> > > module
> > > > > > designer's call. Accessing internals of module from outside is
> > > > uncommon.
> > > > > > For exampe in Java (or C++) private fields/members are not to be
> > > > > accessed.
> > > > > > Properties (setter and getter) are the api that module designer
> > gives
> > > > to
> > > > > > the module user. It is dangerous and has unintended consequences
> if
> > > > > module
> > > > > > user starts to access internals outside the api.
> > > > > >
> > > > > > Partitioning should be next phase. As long as current design does
> > not
> > > > > halt
> > > > > > partitioning it should be ok (which I believe is true).
> > > > > >
> > > > > > Thks,
> > > > > > Amol
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <
> > v.rozov@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > 1. +1, though passing original DAG to module's populateDAG() it
> > is
> > > > not
> > > > > by
> > > > > > > design and is the current pull request implementation details.
> > > > > > >
> > > > > > > 2. While I agree that both Module and StreamingApplication
> let's
> > > > > > > module/application designer to expose DAG design reuse pattern
> > and
> > > > > > > StreamingApplication interface may be extending Module, it does
> > not
> > > > > seem
> > > > > > to
> > > > > > > buy us much. Do we want to allow certain applications to be
> > reused
> > > as
> > > > > > > Modules in other applications or should application package be
> > > > > different
> > > > > > > from Module package? The current approach is to distribute
> > Modules
> > > as
> > > > > > part
> > > > > > > of .jar for example as part of Malhar library without
> necessarily
> > > > > > providing
> > > > > > > all necessary dependencies. Application package on other side
> > must
> > > > > > include
> > > > > > > all dependencies not provided by the platform.
> > > > > > >
> > > > > > > 3, 4. While this will help Module designer, it may complicate
> > > Module
> > > > > > > maintenance and how Modules are used. What if Module designer
> > wants
> > > > to
> > > > > > > change it's implementation and replace one operator
> > implementation
> > > > with
> > > > > > > another operator? Does StreamingApplication designer need to
> know
> > > > > > internal
> > > > > > > structure of Modules? Should Module be considered as a black
> box
> > > > during
> > > > > > > Application design time as it was initially planned?
> > > > > > >
> > > > > > > 5, 6, 7 +1. This is currently proposed behavior of Module
> > > > functionality
> > > > > > > the way I understand it.
> > > > > > >
> > > > > > > 8. We need to see what Module designer can specify for
> > > partitioning.
> > > > > One
> > > > > > > of supported cases should be ability to specify cascading
> > > > partitioning
> > > > > > > scheme.
> > > > > > >
> > > > > > > Thank you,
> > > > > > >
> > > > > > > Vlad
> > > > > > >
> > > > > > >
> > > > > > > On 11/2/15 10:30, Pramod Immaneni wrote:
> > > > > > >
> > > > > > >> I have some comments and suggestions on the module design. I
> > think
> > > > > these
> > > > > > >> need to be taken into account before we can merge the
> > > implementation
> > > > > > >> provided below into the mainline code. I apologize if these
> > should
> > > > > have
> > > > > > >> been brought up earlier as for some reason or the other I was
> > out
> > > of
> > > > > > loop
> > > > > > >> on this one
> > > > > > >>
> > > > > > >>      https://github.com/apache/incubator-apex-core/pull/148
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
> > > > > > >> >
> > > > > > >>
> > > > > > >>      1. DAG scoping currently in the implementation is global
> > for
> > > > > > modules,
> > > > > > >> each module's populateDAG sees the entire DAG. It should be
> > > locally
> > > > > > scoped
> > > > > > >> as one module does not and should not know about another.
> > > > > > >>      2. The module has a populateDAG method with exact same
> > syntax
> > > > as
> > > > > in
> > > > > > >> StreamingApplication. Is StreamingApplication also a module,
> > > should
> > > > it
> > > > > > >> extend that interface.
> > > > > > >>      3. Setting properties for modules is too verbose. Module
> > > > > developer
> > > > > > >> needs to repeat every property they want exposed with a setter
> > and
> > > > > > getter
> > > > > > >> in JAVA. I don't disagree that module developer should be able
> > to
> > > > > choose
> > > > > > >> which properties from which operators need to be exposed but
> the
> > > > > current
> > > > > > >> way seems to duplicate code. Here is a suggestion.
> > > > > > >>           a. Allow modules to specify which operators and
> > > properties
> > > > > can
> > > > > > >> be
> > > > > > >> accessible from outside. One way is in the "populateDAG"
> method
> > of
> > > > the
> > > > > > >> module when adding the operator have the ability to specify if
> > > this
> > > > > > >> operator can be accessible from outside and which or all
> > > properties
> > > > > can
> > > > > > be
> > > > > > >> accessible.
> > > > > > >>           b. Provide methods in ModuleMeta or elsewhere to set
> > > > > property
> > > > > > >> values by specifying the operator name (friendly name) inside
> > the
> > > > > module
> > > > > > >> and property name. If this is allowed by a. above it is
> > successful
> > > > > else
> > > > > > it
> > > > > > >> should fail.
> > > > > > >>           c. Allow a syntax in property files to specify the
> > > > property
> > > > > in
> > > > > > >> b.
> > > > > > >> Example syntax
> > > dt.module.<modulename>.operator.<operatorname>.prop.<
> > > > > > >> propname>
> > > > > > >>      4. For attributes same mechanism as in 3 should apply for
> > the
> > > > > > >> operators
> > > > > > >> that are exposed by the module.  For property file, example
> > syntax
> > > > > > >> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
> > > > > > >>      5. Module developers in addition to 3. and 4. above may
> > > choose
> > > > to
> > > > > > >> support module level properties and attributes. These should
> not
> > > be
> > > > > the
> > > > > > >> default when 3. and 4. are possible but complementary, in
> > addition
> > > > to
> > > > > > >> them.
> > > > > > >> In this case for properties they can implement setters and
> > getters
> > > > in
> > > > > > the
> > > > > > >> module. For attributes the user should still be able to set
> the
> > > > > > attributes
> > > > > > >> using the dag setAttribute method. You could introduce a
> method
> > in
> > > > the
> > > > > > >> module to process attributes that can get called by the engine
> > > once
> > > > > > >> everything is set.
> > > > > > >>      6. For 5. above setting global properties and attributes
> > for
> > > > > module
> > > > > > >> is
> > > > > > >> akin to ideas that have been proposed for the application as
> > > well. A
> > > > > > >> consistent way must be possible for applications as well even
> if
> > > it
> > > > is
> > > > > > not
> > > > > > >> implemented now.
> > > > > > >>      7. For 5. or 6. above there should be a property file way
> > of
> > > > > > >> specifying
> > > > > > >> the global module properties and attributes. Example syntax
> > > > > > >> dt.module.<modulename>.prop.<propname>,
> > > > > > >> dt.module.<modulename>.attr.<attrname>.
> > > > > > >> Notice the difference with 3. c. and 4 above that there is no
> > > > operator
> > > > > > >> keyword here.
> > > > > > >>      8. Partitioning needs to be consistent with what the user
> > > will
> > > > > > expect
> > > > > > >> when they see module as an entity. I will send an image of
> > > possible
> > > > > > >> examples of how the user will expect the physical plan to look
> > in
> > > > > > certain
> > > > > > >> cases.
> > > > > > >>
> > > > > > >> Thanks
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Modules

Posted by Thomas Weise <th...@datatorrent.com>.

You cannot hide and expose the operator inside the module at the same time.
By suggesting grouping of properties by operator you are exposing the
internals of the module.

We have a number of use cases where we explicitly don't want that to happen.

There seems to be an assumption that every property is a separate
getter/setter. That's not the case. A property itself can be a complex
object. You can change that object without changing the module.

BTW that's the case for the Kafka operator connection properties.


On Tue, Nov 3, 2015 at 9:40 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> I am not suggesting we leak the internals or compromise access modifiers. I
> want the module developer to have the ability (not mandatory) to make
> available all or a subset of the properties of an operator easily if they
> desire without having to create setter/getter for each of them. You don't
> have to expose the operator they belong to. My preference would also be to
> preserve the namespace of the properties in some way for example by
> grouping them by operator name. Think about scenario where people have
> built modules using kafka input operator and there is a new kafka
> connection property. Without having this ability the modules have to be
> changed to support this property. With this feature the module developers
> have a choice whether to keep the list of kafka properties fixed in the
> module or allow new properties.
>
> Thanks
>
> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
> > There is also the option to inherit common module properties through a
> base
> > class.
> >
> > I don't see how this is any different from an operator. The developer
> > decides what gets exposed and has the same options to control it.
> >
> > Encapsulation is good practice, by leaking the module internals the using
> > code becomes brittle.
> >
> > Thomas
> >
> > On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com> wrote:
> >
> > > The same goes for a Java or C++ class that changes its api. In general
> > this
> > > is left to the developer, and these languages have internals as private
> > by
> > > default for precisely the same purpose. The module developer must have
> > the
> > > right to change internals, keep api clean/constant and expect user code
> > to
> > > not break.
> > >
> > > Thks,
> > > Amol
> > >
> > >
> > > On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
> pramod@datatorrent.com>
> > > wrote:
> > >
> > > > For 3 and 4 can't we strike a balance between not having to expose
> the
> > > > operators underneath and at the same time not having to write
> > boilerplate
> > > > code for all the properties that the module wants to make available
> > > > outside. It can quickly become unmanageable. For example, an input
> > > operator
> > > > has a new connection property which can be used outside and now all
> the
> > > > modules using that operator, their code has to be modified to just
> add
> > a
> > > > pass through setter/getter. How about treating the operator name as a
> > > group
> > > > name and ability for module developers to easily make
> available/specify
> > > all
> > > > or a subset of the properties of an operator to the user without
> having
> > > to
> > > > explicitly make each of them a module property.
> > > >
> > > > On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
> > wrote:
> > > >
> > > > > 3,4 should follow conventions where the creator decides the api
> > > > (including
> > > > > accessibility). In general only those properties exposed by module
> > > > creator
> > > > > should be settable. What the module internally does with them is
> > module
> > > > > designer's call. Accessing internals of module from outside is
> > > uncommon.
> > > > > For exampe in Java (or C++) private fields/members are not to be
> > > > accessed.
> > > > > Properties (setter and getter) are the api that module designer
> gives
> > > to
> > > > > the module user. It is dangerous and has unintended consequences if
> > > > module
> > > > > user starts to access internals outside the api.
> > > > >
> > > > > Partitioning should be next phase. As long as current design does
> not
> > > > halt
> > > > > partitioning it should be ok (which I believe is true).
> > > > >
> > > > > Thks,
> > > > > Amol
> > > > >
> > > > >
> > > > > On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <
> v.rozov@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > 1. +1, though passing original DAG to module's populateDAG() it
> is
> > > not
> > > > by
> > > > > > design and is the current pull request implementation details.
> > > > > >
> > > > > > 2. While I agree that both Module and StreamingApplication let's
> > > > > > module/application designer to expose DAG design reuse pattern
> and
> > > > > > StreamingApplication interface may be extending Module, it does
> not
> > > > seem
> > > > > to
> > > > > > buy us much. Do we want to allow certain applications to be
> reused
> > as
> > > > > > Modules in other applications or should application package be
> > > > different
> > > > > > from Module package? The current approach is to distribute
> Modules
> > as
> > > > > part
> > > > > > of .jar for example as part of Malhar library without necessarily
> > > > > providing
> > > > > > all necessary dependencies. Application package on other side
> must
> > > > > include
> > > > > > all dependencies not provided by the platform.
> > > > > >
> > > > > > 3, 4. While this will help Module designer, it may complicate
> > Module
> > > > > > maintenance and how Modules are used. What if Module designer
> wants
> > > to
> > > > > > change it's implementation and replace one operator
> implementation
> > > with
> > > > > > another operator? Does StreamingApplication designer need to know
> > > > > internal
> > > > > > structure of Modules? Should Module be considered as a black box
> > > during
> > > > > > Application design time as it was initially planned?
> > > > > >
> > > > > > 5, 6, 7 +1. This is currently proposed behavior of Module
> > > functionality
> > > > > > the way I understand it.
> > > > > >
> > > > > > 8. We need to see what Module designer can specify for
> > partitioning.
> > > > One
> > > > > > of supported cases should be ability to specify cascading
> > > partitioning
> > > > > > scheme.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > Vlad
> > > > > >
> > > > > >
> > > > > > On 11/2/15 10:30, Pramod Immaneni wrote:
> > > > > >
> > > > > >> I have some comments and suggestions on the module design. I
> think
> > > > these
> > > > > >> need to be taken into account before we can merge the
> > implementation
> > > > > >> provided below into the mainline code. I apologize if these
> should
> > > > have
> > > > > >> been brought up earlier as for some reason or the other I was
> out
> > of
> > > > > loop
> > > > > >> on this one
> > > > > >>
> > > > > >>      https://github.com/apache/incubator-apex-core/pull/148
> > > > > >> <
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
> > > > > >> >
> > > > > >>
> > > > > >>      1. DAG scoping currently in the implementation is global
> for
> > > > > modules,
> > > > > >> each module's populateDAG sees the entire DAG. It should be
> > locally
> > > > > scoped
> > > > > >> as one module does not and should not know about another.
> > > > > >>      2. The module has a populateDAG method with exact same
> syntax
> > > as
> > > > in
> > > > > >> StreamingApplication. Is StreamingApplication also a module,
> > should
> > > it
> > > > > >> extend that interface.
> > > > > >>      3. Setting properties for modules is too verbose. Module
> > > > developer
> > > > > >> needs to repeat every property they want exposed with a setter
> and
> > > > > getter
> > > > > >> in JAVA. I don't disagree that module developer should be able
> to
> > > > choose
> > > > > >> which properties from which operators need to be exposed but the
> > > > current
> > > > > >> way seems to duplicate code. Here is a suggestion.
> > > > > >>           a. Allow modules to specify which operators and
> > properties
> > > > can
> > > > > >> be
> > > > > >> accessible from outside. One way is in the "populateDAG" method
> of
> > > the
> > > > > >> module when adding the operator have the ability to specify if
> > this
> > > > > >> operator can be accessible from outside and which or all
> > properties
> > > > can
> > > > > be
> > > > > >> accessible.
> > > > > >>           b. Provide methods in ModuleMeta or elsewhere to set
> > > > property
> > > > > >> values by specifying the operator name (friendly name) inside
> the
> > > > module
> > > > > >> and property name. If this is allowed by a. above it is
> successful
> > > > else
> > > > > it
> > > > > >> should fail.
> > > > > >>           c. Allow a syntax in property files to specify the
> > > property
> > > > in
> > > > > >> b.
> > > > > >> Example syntax
> > dt.module.<modulename>.operator.<operatorname>.prop.<
> > > > > >> propname>
> > > > > >>      4. For attributes same mechanism as in 3 should apply for
> the
> > > > > >> operators
> > > > > >> that are exposed by the module.  For property file, example
> syntax
> > > > > >> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
> > > > > >>      5. Module developers in addition to 3. and 4. above may
> > choose
> > > to
> > > > > >> support module level properties and attributes. These should not
> > be
> > > > the
> > > > > >> default when 3. and 4. are possible but complementary, in
> addition
> > > to
> > > > > >> them.
> > > > > >> In this case for properties they can implement setters and
> getters
> > > in
> > > > > the
> > > > > >> module. For attributes the user should still be able to set the
> > > > > attributes
> > > > > >> using the dag setAttribute method. You could introduce a method
> in
> > > the
> > > > > >> module to process attributes that can get called by the engine
> > once
> > > > > >> everything is set.
> > > > > >>      6. For 5. above setting global properties and attributes
> for
> > > > module
> > > > > >> is
> > > > > >> akin to ideas that have been proposed for the application as
> > well. A
> > > > > >> consistent way must be possible for applications as well even if
> > it
> > > is
> > > > > not
> > > > > >> implemented now.
> > > > > >>      7. For 5. or 6. above there should be a property file way
> of
> > > > > >> specifying
> > > > > >> the global module properties and attributes. Example syntax
> > > > > >> dt.module.<modulename>.prop.<propname>,
> > > > > >> dt.module.<modulename>.attr.<attrname>.
> > > > > >> Notice the difference with 3. c. and 4 above that there is no
> > > operator
> > > > > >> keyword here.
> > > > > >>      8. Partitioning needs to be consistent with what the user
> > will
> > > > > expect
> > > > > >> when they see module as an entity. I will send an image of
> > possible
> > > > > >> examples of how the user will expect the physical plan to look
> in
> > > > > certain
> > > > > >> cases.
> > > > > >>
> > > > > >> Thanks
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Modules

Posted by Pramod Immaneni <pr...@datatorrent.com>.

I am not suggesting we leak the internals or compromise access modifiers. I
want the module developer to have the ability (not mandatory) to make
available all or a subset of the properties of an operator easily if they
desire without having to create setter/getter for each of them. You don't
have to expose the operator they belong to. My preference would also be to
preserve the namespace of the properties in some way for example by
grouping them by operator name. Think about scenario where people have
built modules using kafka input operator and there is a new kafka
connection property. Without having this ability the modules have to be
changed to support this property. With this feature the module developers
have a choice whether to keep the list of kafka properties fixed in the
module or allow new properties.

Thanks

On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <th...@datatorrent.com> wrote:

> There is also the option to inherit common module properties through a base
> class.
>
> I don't see how this is any different from an operator. The developer
> decides what gets exposed and has the same options to control it.
>
> Encapsulation is good practice, by leaking the module internals the using
> code becomes brittle.
>
> Thomas
>
> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com> wrote:
>
> > The same goes for a Java or C++ class that changes its api. In general
> this
> > is left to the developer, and these languages have internals as private
> by
> > default for precisely the same purpose. The module developer must have
> the
> > right to change internals, keep api clean/constant and expect user code
> to
> > not break.
> >
> > Thks,
> > Amol
> >
> >
> > On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <pr...@datatorrent.com>
> > wrote:
> >
> > > For 3 and 4 can't we strike a balance between not having to expose the
> > > operators underneath and at the same time not having to write
> boilerplate
> > > code for all the properties that the module wants to make available
> > > outside. It can quickly become unmanageable. For example, an input
> > operator
> > > has a new connection property which can be used outside and now all the
> > > modules using that operator, their code has to be modified to just add
> a
> > > pass through setter/getter. How about treating the operator name as a
> > group
> > > name and ability for module developers to easily make available/specify
> > all
> > > or a subset of the properties of an operator to the user without having
> > to
> > > explicitly make each of them a module property.
> > >
> > > On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com>
> wrote:
> > >
> > > > 3,4 should follow conventions where the creator decides the api
> > > (including
> > > > accessibility). In general only those properties exposed by module
> > > creator
> > > > should be settable. What the module internally does with them is
> module
> > > > designer's call. Accessing internals of module from outside is
> > uncommon.
> > > > For exampe in Java (or C++) private fields/members are not to be
> > > accessed.
> > > > Properties (setter and getter) are the api that module designer gives
> > to
> > > > the module user. It is dangerous and has unintended consequences if
> > > module
> > > > user starts to access internals outside the api.
> > > >
> > > > Partitioning should be next phase. As long as current design does not
> > > halt
> > > > partitioning it should be ok (which I believe is true).
> > > >
> > > > Thks,
> > > > Amol
> > > >
> > > >
> > > > On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v....@datatorrent.com>
> > > > wrote:
> > > >
> > > > > 1. +1, though passing original DAG to module's populateDAG() it is
> > not
> > > by
> > > > > design and is the current pull request implementation details.
> > > > >
> > > > > 2. While I agree that both Module and StreamingApplication let's
> > > > > module/application designer to expose DAG design reuse pattern and
> > > > > StreamingApplication interface may be extending Module, it does not
> > > seem
> > > > to
> > > > > buy us much. Do we want to allow certain applications to be reused
> as
> > > > > Modules in other applications or should application package be
> > > different
> > > > > from Module package? The current approach is to distribute Modules
> as
> > > > part
> > > > > of .jar for example as part of Malhar library without necessarily
> > > > providing
> > > > > all necessary dependencies. Application package on other side must
> > > > include
> > > > > all dependencies not provided by the platform.
> > > > >
> > > > > 3, 4. While this will help Module designer, it may complicate
> Module
> > > > > maintenance and how Modules are used. What if Module designer wants
> > to
> > > > > change it's implementation and replace one operator implementation
> > with
> > > > > another operator? Does StreamingApplication designer need to know
> > > > internal
> > > > > structure of Modules? Should Module be considered as a black box
> > during
> > > > > Application design time as it was initially planned?
> > > > >
> > > > > 5, 6, 7 +1. This is currently proposed behavior of Module
> > functionality
> > > > > the way I understand it.
> > > > >
> > > > > 8. We need to see what Module designer can specify for
> partitioning.
> > > One
> > > > > of supported cases should be ability to specify cascading
> > partitioning
> > > > > scheme.
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Vlad
> > > > >
> > > > >
> > > > > On 11/2/15 10:30, Pramod Immaneni wrote:
> > > > >
> > > > >> I have some comments and suggestions on the module design. I think
> > > these
> > > > >> need to be taken into account before we can merge the
> implementation
> > > > >> provided below into the mainline code. I apologize if these should
> > > have
> > > > >> been brought up earlier as for some reason or the other I was out
> of
> > > > loop
> > > > >> on this one
> > > > >>
> > > > >>      https://github.com/apache/incubator-apex-core/pull/148
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
> > > > >> >
> > > > >>
> > > > >>      1. DAG scoping currently in the implementation is global for
> > > > modules,
> > > > >> each module's populateDAG sees the entire DAG. It should be
> locally
> > > > scoped
> > > > >> as one module does not and should not know about another.
> > > > >>      2. The module has a populateDAG method with exact same syntax
> > as
> > > in
> > > > >> StreamingApplication. Is StreamingApplication also a module,
> should
> > it
> > > > >> extend that interface.
> > > > >>      3. Setting properties for modules is too verbose. Module
> > > developer
> > > > >> needs to repeat every property they want exposed with a setter and
> > > > getter
> > > > >> in JAVA. I don't disagree that module developer should be able to
> > > choose
> > > > >> which properties from which operators need to be exposed but the
> > > current
> > > > >> way seems to duplicate code. Here is a suggestion.
> > > > >>           a. Allow modules to specify which operators and
> properties
> > > can
> > > > >> be
> > > > >> accessible from outside. One way is in the "populateDAG" method of
> > the
> > > > >> module when adding the operator have the ability to specify if
> this
> > > > >> operator can be accessible from outside and which or all
> properties
> > > can
> > > > be
> > > > >> accessible.
> > > > >>           b. Provide methods in ModuleMeta or elsewhere to set
> > > property
> > > > >> values by specifying the operator name (friendly name) inside the
> > > module
> > > > >> and property name. If this is allowed by a. above it is successful
> > > else
> > > > it
> > > > >> should fail.
> > > > >>           c. Allow a syntax in property files to specify the
> > property
> > > in
> > > > >> b.
> > > > >> Example syntax
> dt.module.<modulename>.operator.<operatorname>.prop.<
> > > > >> propname>
> > > > >>      4. For attributes same mechanism as in 3 should apply for the
> > > > >> operators
> > > > >> that are exposed by the module.  For property file, example syntax
> > > > >> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
> > > > >>      5. Module developers in addition to 3. and 4. above may
> choose
> > to
> > > > >> support module level properties and attributes. These should not
> be
> > > the
> > > > >> default when 3. and 4. are possible but complementary, in addition
> > to
> > > > >> them.
> > > > >> In this case for properties they can implement setters and getters
> > in
> > > > the
> > > > >> module. For attributes the user should still be able to set the
> > > > attributes
> > > > >> using the dag setAttribute method. You could introduce a method in
> > the
> > > > >> module to process attributes that can get called by the engine
> once
> > > > >> everything is set.
> > > > >>      6. For 5. above setting global properties and attributes for
> > > module
> > > > >> is
> > > > >> akin to ideas that have been proposed for the application as
> well. A
> > > > >> consistent way must be possible for applications as well even if
> it
> > is
> > > > not
> > > > >> implemented now.
> > > > >>      7. For 5. or 6. above there should be a property file way of
> > > > >> specifying
> > > > >> the global module properties and attributes. Example syntax
> > > > >> dt.module.<modulename>.prop.<propname>,
> > > > >> dt.module.<modulename>.attr.<attrname>.
> > > > >> Notice the difference with 3. c. and 4 above that there is no
> > operator
> > > > >> keyword here.
> > > > >>      8. Partitioning needs to be consistent with what the user
> will
> > > > expect
> > > > >> when they see module as an entity. I will send an image of
> possible
> > > > >> examples of how the user will expect the physical plan to look in
> > > > certain
> > > > >> cases.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Modules

Posted by Thomas Weise <th...@datatorrent.com>.

There is also the option to inherit common module properties through a base
class.

I don't see how this is any different from an operator. The developer
decides what gets exposed and has the same options to control it.

Encapsulation is good practice, by leaking the module internals the using
code becomes brittle.

Thomas

On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <am...@datatorrent.com> wrote:

> The same goes for a Java or C++ class that changes its api. In general this
> is left to the developer, and these languages have internals as private by
> default for precisely the same purpose. The module developer must have the
> right to change internals, keep api clean/constant and expect user code to
> not break.
>
> Thks,
> Amol
>
>
> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
> > For 3 and 4 can't we strike a balance between not having to expose the
> > operators underneath and at the same time not having to write boilerplate
> > code for all the properties that the module wants to make available
> > outside. It can quickly become unmanageable. For example, an input
> operator
> > has a new connection property which can be used outside and now all the
> > modules using that operator, their code has to be modified to just add a
> > pass through setter/getter. How about treating the operator name as a
> group
> > name and ability for module developers to easily make available/specify
> all
> > or a subset of the properties of an operator to the user without having
> to
> > explicitly make each of them a module property.
> >
> > On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com> wrote:
> >
> > > 3,4 should follow conventions where the creator decides the api
> > (including
> > > accessibility). In general only those properties exposed by module
> > creator
> > > should be settable. What the module internally does with them is module
> > > designer's call. Accessing internals of module from outside is
> uncommon.
> > > For exampe in Java (or C++) private fields/members are not to be
> > accessed.
> > > Properties (setter and getter) are the api that module designer gives
> to
> > > the module user. It is dangerous and has unintended consequences if
> > module
> > > user starts to access internals outside the api.
> > >
> > > Partitioning should be next phase. As long as current design does not
> > halt
> > > partitioning it should be ok (which I believe is true).
> > >
> > > Thks,
> > > Amol
> > >
> > >
> > > On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v....@datatorrent.com>
> > > wrote:
> > >
> > > > 1. +1, though passing original DAG to module's populateDAG() it is
> not
> > by
> > > > design and is the current pull request implementation details.
> > > >
> > > > 2. While I agree that both Module and StreamingApplication let's
> > > > module/application designer to expose DAG design reuse pattern and
> > > > StreamingApplication interface may be extending Module, it does not
> > seem
> > > to
> > > > buy us much. Do we want to allow certain applications to be reused as
> > > > Modules in other applications or should application package be
> > different
> > > > from Module package? The current approach is to distribute Modules as
> > > part
> > > > of .jar for example as part of Malhar library without necessarily
> > > providing
> > > > all necessary dependencies. Application package on other side must
> > > include
> > > > all dependencies not provided by the platform.
> > > >
> > > > 3, 4. While this will help Module designer, it may complicate Module
> > > > maintenance and how Modules are used. What if Module designer wants
> to
> > > > change it's implementation and replace one operator implementation
> with
> > > > another operator? Does StreamingApplication designer need to know
> > > internal
> > > > structure of Modules? Should Module be considered as a black box
> during
> > > > Application design time as it was initially planned?
> > > >
> > > > 5, 6, 7 +1. This is currently proposed behavior of Module
> functionality
> > > > the way I understand it.
> > > >
> > > > 8. We need to see what Module designer can specify for partitioning.
> > One
> > > > of supported cases should be ability to specify cascading
> partitioning
> > > > scheme.
> > > >
> > > > Thank you,
> > > >
> > > > Vlad
> > > >
> > > >
> > > > On 11/2/15 10:30, Pramod Immaneni wrote:
> > > >
> > > >> I have some comments and suggestions on the module design. I think
> > these
> > > >> need to be taken into account before we can merge the implementation
> > > >> provided below into the mainline code. I apologize if these should
> > have
> > > >> been brought up earlier as for some reason or the other I was out of
> > > loop
> > > >> on this one
> > > >>
> > > >>      https://github.com/apache/incubator-apex-core/pull/148
> > > >> <
> > > >>
> > >
> >
> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
> > > >> >
> > > >>
> > > >>      1. DAG scoping currently in the implementation is global for
> > > modules,
> > > >> each module's populateDAG sees the entire DAG. It should be locally
> > > scoped
> > > >> as one module does not and should not know about another.
> > > >>      2. The module has a populateDAG method with exact same syntax
> as
> > in
> > > >> StreamingApplication. Is StreamingApplication also a module, should
> it
> > > >> extend that interface.
> > > >>      3. Setting properties for modules is too verbose. Module
> > developer
> > > >> needs to repeat every property they want exposed with a setter and
> > > getter
> > > >> in JAVA. I don't disagree that module developer should be able to
> > choose
> > > >> which properties from which operators need to be exposed but the
> > current
> > > >> way seems to duplicate code. Here is a suggestion.
> > > >>           a. Allow modules to specify which operators and properties
> > can
> > > >> be
> > > >> accessible from outside. One way is in the "populateDAG" method of
> the
> > > >> module when adding the operator have the ability to specify if this
> > > >> operator can be accessible from outside and which or all properties
> > can
> > > be
> > > >> accessible.
> > > >>           b. Provide methods in ModuleMeta or elsewhere to set
> > property
> > > >> values by specifying the operator name (friendly name) inside the
> > module
> > > >> and property name. If this is allowed by a. above it is successful
> > else
> > > it
> > > >> should fail.
> > > >>           c. Allow a syntax in property files to specify the
> property
> > in
> > > >> b.
> > > >> Example syntax dt.module.<modulename>.operator.<operatorname>.prop.<
> > > >> propname>
> > > >>      4. For attributes same mechanism as in 3 should apply for the
> > > >> operators
> > > >> that are exposed by the module.  For property file, example syntax
> > > >> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
> > > >>      5. Module developers in addition to 3. and 4. above may choose
> to
> > > >> support module level properties and attributes. These should not be
> > the
> > > >> default when 3. and 4. are possible but complementary, in addition
> to
> > > >> them.
> > > >> In this case for properties they can implement setters and getters
> in
> > > the
> > > >> module. For attributes the user should still be able to set the
> > > attributes
> > > >> using the dag setAttribute method. You could introduce a method in
> the
> > > >> module to process attributes that can get called by the engine once
> > > >> everything is set.
> > > >>      6. For 5. above setting global properties and attributes for
> > module
> > > >> is
> > > >> akin to ideas that have been proposed for the application as well. A
> > > >> consistent way must be possible for applications as well even if it
> is
> > > not
> > > >> implemented now.
> > > >>      7. For 5. or 6. above there should be a property file way of
> > > >> specifying
> > > >> the global module properties and attributes. Example syntax
> > > >> dt.module.<modulename>.prop.<propname>,
> > > >> dt.module.<modulename>.attr.<attrname>.
> > > >> Notice the difference with 3. c. and 4 above that there is no
> operator
> > > >> keyword here.
> > > >>      8. Partitioning needs to be consistent with what the user will
> > > expect
> > > >> when they see module as an entity. I will send an image of possible
> > > >> examples of how the user will expect the physical plan to look in
> > > certain
> > > >> cases.
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: Modules

Posted by Amol Kekre <am...@datatorrent.com>.

The same goes for a Java or C++ class that changes its api. In general this
is left to the developer, and these languages have internals as private by
default for precisely the same purpose. The module developer must have the
right to change internals, keep api clean/constant and expect user code to
not break.

Thks,
Amol


On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> For 3 and 4 can't we strike a balance between not having to expose the
> operators underneath and at the same time not having to write boilerplate
> code for all the properties that the module wants to make available
> outside. It can quickly become unmanageable. For example, an input operator
> has a new connection property which can be used outside and now all the
> modules using that operator, their code has to be modified to just add a
> pass through setter/getter. How about treating the operator name as a group
> name and ability for module developers to easily make available/specify all
> or a subset of the properties of an operator to the user without having to
> explicitly make each of them a module property.
>
> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com> wrote:
>
> > 3,4 should follow conventions where the creator decides the api
> (including
> > accessibility). In general only those properties exposed by module
> creator
> > should be settable. What the module internally does with them is module
> > designer's call. Accessing internals of module from outside is uncommon.
> > For exampe in Java (or C++) private fields/members are not to be
> accessed.
> > Properties (setter and getter) are the api that module designer gives to
> > the module user. It is dangerous and has unintended consequences if
> module
> > user starts to access internals outside the api.
> >
> > Partitioning should be next phase. As long as current design does not
> halt
> > partitioning it should be ok (which I believe is true).
> >
> > Thks,
> > Amol
> >
> >
> > On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v....@datatorrent.com>
> > wrote:
> >
> > > 1. +1, though passing original DAG to module's populateDAG() it is not
> by
> > > design and is the current pull request implementation details.
> > >
> > > 2. While I agree that both Module and StreamingApplication let's
> > > module/application designer to expose DAG design reuse pattern and
> > > StreamingApplication interface may be extending Module, it does not
> seem
> > to
> > > buy us much. Do we want to allow certain applications to be reused as
> > > Modules in other applications or should application package be
> different
> > > from Module package? The current approach is to distribute Modules as
> > part
> > > of .jar for example as part of Malhar library without necessarily
> > providing
> > > all necessary dependencies. Application package on other side must
> > include
> > > all dependencies not provided by the platform.
> > >
> > > 3, 4. While this will help Module designer, it may complicate Module
> > > maintenance and how Modules are used. What if Module designer wants to
> > > change it's implementation and replace one operator implementation with
> > > another operator? Does StreamingApplication designer need to know
> > internal
> > > structure of Modules? Should Module be considered as a black box during
> > > Application design time as it was initially planned?
> > >
> > > 5, 6, 7 +1. This is currently proposed behavior of Module functionality
> > > the way I understand it.
> > >
> > > 8. We need to see what Module designer can specify for partitioning.
> One
> > > of supported cases should be ability to specify cascading partitioning
> > > scheme.
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> > >
> > > On 11/2/15 10:30, Pramod Immaneni wrote:
> > >
> > >> I have some comments and suggestions on the module design. I think
> these
> > >> need to be taken into account before we can merge the implementation
> > >> provided below into the mainline code. I apologize if these should
> have
> > >> been brought up earlier as for some reason or the other I was out of
> > loop
> > >> on this one
> > >>
> > >>      https://github.com/apache/incubator-apex-core/pull/148
> > >> <
> > >>
> >
> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
> > >> >
> > >>
> > >>      1. DAG scoping currently in the implementation is global for
> > modules,
> > >> each module's populateDAG sees the entire DAG. It should be locally
> > scoped
> > >> as one module does not and should not know about another.
> > >>      2. The module has a populateDAG method with exact same syntax as
> in
> > >> StreamingApplication. Is StreamingApplication also a module, should it
> > >> extend that interface.
> > >>      3. Setting properties for modules is too verbose. Module
> developer
> > >> needs to repeat every property they want exposed with a setter and
> > getter
> > >> in JAVA. I don't disagree that module developer should be able to
> choose
> > >> which properties from which operators need to be exposed but the
> current
> > >> way seems to duplicate code. Here is a suggestion.
> > >>           a. Allow modules to specify which operators and properties
> can
> > >> be
> > >> accessible from outside. One way is in the "populateDAG" method of the
> > >> module when adding the operator have the ability to specify if this
> > >> operator can be accessible from outside and which or all properties
> can
> > be
> > >> accessible.
> > >>           b. Provide methods in ModuleMeta or elsewhere to set
> property
> > >> values by specifying the operator name (friendly name) inside the
> module
> > >> and property name. If this is allowed by a. above it is successful
> else
> > it
> > >> should fail.
> > >>           c. Allow a syntax in property files to specify the property
> in
> > >> b.
> > >> Example syntax dt.module.<modulename>.operator.<operatorname>.prop.<
> > >> propname>
> > >>      4. For attributes same mechanism as in 3 should apply for the
> > >> operators
> > >> that are exposed by the module.  For property file, example syntax
> > >> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
> > >>      5. Module developers in addition to 3. and 4. above may choose to
> > >> support module level properties and attributes. These should not be
> the
> > >> default when 3. and 4. are possible but complementary, in addition to
> > >> them.
> > >> In this case for properties they can implement setters and getters in
> > the
> > >> module. For attributes the user should still be able to set the
> > attributes
> > >> using the dag setAttribute method. You could introduce a method in the
> > >> module to process attributes that can get called by the engine once
> > >> everything is set.
> > >>      6. For 5. above setting global properties and attributes for
> module
> > >> is
> > >> akin to ideas that have been proposed for the application as well. A
> > >> consistent way must be possible for applications as well even if it is
> > not
> > >> implemented now.
> > >>      7. For 5. or 6. above there should be a property file way of
> > >> specifying
> > >> the global module properties and attributes. Example syntax
> > >> dt.module.<modulename>.prop.<propname>,
> > >> dt.module.<modulename>.attr.<attrname>.
> > >> Notice the difference with 3. c. and 4 above that there is no operator
> > >> keyword here.
> > >>      8. Partitioning needs to be consistent with what the user will
> > expect
> > >> when they see module as an entity. I will send an image of possible
> > >> examples of how the user will expect the physical plan to look in
> > certain
> > >> cases.
> > >>
> > >> Thanks
> > >>
> > >>
> > >
> >
>

Re: Modules

Posted by Pramod Immaneni <pr...@datatorrent.com>.

For 3 and 4 can't we strike a balance between not having to expose the
operators underneath and at the same time not having to write boilerplate
code for all the properties that the module wants to make available
outside. It can quickly become unmanageable. For example, an input operator
has a new connection property which can be used outside and now all the
modules using that operator, their code has to be modified to just add a
pass through setter/getter. How about treating the operator name as a group
name and ability for module developers to easily make available/specify all
or a subset of the properties of an operator to the user without having to
explicitly make each of them a module property.

On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <am...@datatorrent.com> wrote:

> 3,4 should follow conventions where the creator decides the api (including
> accessibility). In general only those properties exposed by module creator
> should be settable. What the module internally does with them is module
> designer's call. Accessing internals of module from outside is uncommon.
> For exampe in Java (or C++) private fields/members are not to be accessed.
> Properties (setter and getter) are the api that module designer gives to
> the module user. It is dangerous and has unintended consequences if module
> user starts to access internals outside the api.
>
> Partitioning should be next phase. As long as current design does not halt
> partitioning it should be ok (which I believe is true).
>
> Thks,
> Amol
>
>
> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v....@datatorrent.com>
> wrote:
>
> > 1. +1, though passing original DAG to module's populateDAG() it is not by
> > design and is the current pull request implementation details.
> >
> > 2. While I agree that both Module and StreamingApplication let's
> > module/application designer to expose DAG design reuse pattern and
> > StreamingApplication interface may be extending Module, it does not seem
> to
> > buy us much. Do we want to allow certain applications to be reused as
> > Modules in other applications or should application package be different
> > from Module package? The current approach is to distribute Modules as
> part
> > of .jar for example as part of Malhar library without necessarily
> providing
> > all necessary dependencies. Application package on other side must
> include
> > all dependencies not provided by the platform.
> >
> > 3, 4. While this will help Module designer, it may complicate Module
> > maintenance and how Modules are used. What if Module designer wants to
> > change it's implementation and replace one operator implementation with
> > another operator? Does StreamingApplication designer need to know
> internal
> > structure of Modules? Should Module be considered as a black box during
> > Application design time as it was initially planned?
> >
> > 5, 6, 7 +1. This is currently proposed behavior of Module functionality
> > the way I understand it.
> >
> > 8. We need to see what Module designer can specify for partitioning. One
> > of supported cases should be ability to specify cascading partitioning
> > scheme.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 11/2/15 10:30, Pramod Immaneni wrote:
> >
> >> I have some comments and suggestions on the module design. I think these
> >> need to be taken into account before we can merge the implementation
> >> provided below into the mainline code. I apologize if these should have
> >> been brought up earlier as for some reason or the other I was out of
> loop
> >> on this one
> >>
> >>      https://github.com/apache/incubator-apex-core/pull/148
> >> <
> >>
> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
> >> >
> >>
> >>      1. DAG scoping currently in the implementation is global for
> modules,
> >> each module's populateDAG sees the entire DAG. It should be locally
> scoped
> >> as one module does not and should not know about another.
> >>      2. The module has a populateDAG method with exact same syntax as in
> >> StreamingApplication. Is StreamingApplication also a module, should it
> >> extend that interface.
> >>      3. Setting properties for modules is too verbose. Module developer
> >> needs to repeat every property they want exposed with a setter and
> getter
> >> in JAVA. I don't disagree that module developer should be able to choose
> >> which properties from which operators need to be exposed but the current
> >> way seems to duplicate code. Here is a suggestion.
> >>           a. Allow modules to specify which operators and properties can
> >> be
> >> accessible from outside. One way is in the "populateDAG" method of the
> >> module when adding the operator have the ability to specify if this
> >> operator can be accessible from outside and which or all properties can
> be
> >> accessible.
> >>           b. Provide methods in ModuleMeta or elsewhere to set property
> >> values by specifying the operator name (friendly name) inside the module
> >> and property name. If this is allowed by a. above it is successful else
> it
> >> should fail.
> >>           c. Allow a syntax in property files to specify the property in
> >> b.
> >> Example syntax dt.module.<modulename>.operator.<operatorname>.prop.<
> >> propname>
> >>      4. For attributes same mechanism as in 3 should apply for the
> >> operators
> >> that are exposed by the module.  For property file, example syntax
> >> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
> >>      5. Module developers in addition to 3. and 4. above may choose to
> >> support module level properties and attributes. These should not be the
> >> default when 3. and 4. are possible but complementary, in addition to
> >> them.
> >> In this case for properties they can implement setters and getters in
> the
> >> module. For attributes the user should still be able to set the
> attributes
> >> using the dag setAttribute method. You could introduce a method in the
> >> module to process attributes that can get called by the engine once
> >> everything is set.
> >>      6. For 5. above setting global properties and attributes for module
> >> is
> >> akin to ideas that have been proposed for the application as well. A
> >> consistent way must be possible for applications as well even if it is
> not
> >> implemented now.
> >>      7. For 5. or 6. above there should be a property file way of
> >> specifying
> >> the global module properties and attributes. Example syntax
> >> dt.module.<modulename>.prop.<propname>,
> >> dt.module.<modulename>.attr.<attrname>.
> >> Notice the difference with 3. c. and 4 above that there is no operator
> >> keyword here.
> >>      8. Partitioning needs to be consistent with what the user will
> expect
> >> when they see module as an entity. I will send an image of possible
> >> examples of how the user will expect the physical plan to look in
> certain
> >> cases.
> >>
> >> Thanks
> >>
> >>
> >
>

Re: Modules

Posted by Amol Kekre <am...@datatorrent.com>.

3,4 should follow conventions where the creator decides the api (including
accessibility). In general only those properties exposed by module creator
should be settable. What the module internally does with them is module
designer's call. Accessing internals of module from outside is uncommon.
For exampe in Java (or C++) private fields/members are not to be accessed.
Properties (setter and getter) are the api that module designer gives to
the module user. It is dangerous and has unintended consequences if module
user starts to access internals outside the api.

Partitioning should be next phase. As long as current design does not halt
partitioning it should be ok (which I believe is true).

Thks,
Amol


On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v....@datatorrent.com> wrote:

> 1. +1, though passing original DAG to module's populateDAG() it is not by
> design and is the current pull request implementation details.
>
> 2. While I agree that both Module and StreamingApplication let's
> module/application designer to expose DAG design reuse pattern and
> StreamingApplication interface may be extending Module, it does not seem to
> buy us much. Do we want to allow certain applications to be reused as
> Modules in other applications or should application package be different
> from Module package? The current approach is to distribute Modules as part
> of .jar for example as part of Malhar library without necessarily providing
> all necessary dependencies. Application package on other side must include
> all dependencies not provided by the platform.
>
> 3, 4. While this will help Module designer, it may complicate Module
> maintenance and how Modules are used. What if Module designer wants to
> change it's implementation and replace one operator implementation with
> another operator? Does StreamingApplication designer need to know internal
> structure of Modules? Should Module be considered as a black box during
> Application design time as it was initially planned?
>
> 5, 6, 7 +1. This is currently proposed behavior of Module functionality
> the way I understand it.
>
> 8. We need to see what Module designer can specify for partitioning. One
> of supported cases should be ability to specify cascading partitioning
> scheme.
>
> Thank you,
>
> Vlad
>
>
> On 11/2/15 10:30, Pramod Immaneni wrote:
>
>> I have some comments and suggestions on the module design. I think these
>> need to be taken into account before we can merge the implementation
>> provided below into the mainline code. I apologize if these should have
>> been brought up earlier as for some reason or the other I was out of loop
>> on this one
>>
>>      https://github.com/apache/incubator-apex-core/pull/148
>> <
>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>> >
>>
>>      1. DAG scoping currently in the implementation is global for modules,
>> each module's populateDAG sees the entire DAG. It should be locally scoped
>> as one module does not and should not know about another.
>>      2. The module has a populateDAG method with exact same syntax as in
>> StreamingApplication. Is StreamingApplication also a module, should it
>> extend that interface.
>>      3. Setting properties for modules is too verbose. Module developer
>> needs to repeat every property they want exposed with a setter and getter
>> in JAVA. I don't disagree that module developer should be able to choose
>> which properties from which operators need to be exposed but the current
>> way seems to duplicate code. Here is a suggestion.
>>           a. Allow modules to specify which operators and properties can
>> be
>> accessible from outside. One way is in the "populateDAG" method of the
>> module when adding the operator have the ability to specify if this
>> operator can be accessible from outside and which or all properties can be
>> accessible.
>>           b. Provide methods in ModuleMeta or elsewhere to set property
>> values by specifying the operator name (friendly name) inside the module
>> and property name. If this is allowed by a. above it is successful else it
>> should fail.
>>           c. Allow a syntax in property files to specify the property in
>> b.
>> Example syntax dt.module.<modulename>.operator.<operatorname>.prop.<
>> propname>
>>      4. For attributes same mechanism as in 3 should apply for the
>> operators
>> that are exposed by the module.  For property file, example syntax
>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>      5. Module developers in addition to 3. and 4. above may choose to
>> support module level properties and attributes. These should not be the
>> default when 3. and 4. are possible but complementary, in addition to
>> them.
>> In this case for properties they can implement setters and getters in the
>> module. For attributes the user should still be able to set the attributes
>> using the dag setAttribute method. You could introduce a method in the
>> module to process attributes that can get called by the engine once
>> everything is set.
>>      6. For 5. above setting global properties and attributes for module
>> is
>> akin to ideas that have been proposed for the application as well. A
>> consistent way must be possible for applications as well even if it is not
>> implemented now.
>>      7. For 5. or 6. above there should be a property file way of
>> specifying
>> the global module properties and attributes. Example syntax
>> dt.module.<modulename>.prop.<propname>,
>> dt.module.<modulename>.attr.<attrname>.
>> Notice the difference with 3. c. and 4 above that there is no operator
>> keyword here.
>>      8. Partitioning needs to be consistent with what the user will expect
>> when they see module as an entity. I will send an image of possible
>> examples of how the user will expect the physical plan to look in certain
>> cases.
>>
>> Thanks
>>
>>
>

Re: Modules

Posted by Vlad Rozov <v....@datatorrent.com>.

1. +1, though passing original DAG to module's populateDAG() it is not 
by design and is the current pull request implementation details.

2. While I agree that both Module and StreamingApplication let's 
module/application designer to expose DAG design reuse pattern and 
StreamingApplication interface may be extending Module, it does not seem 
to buy us much. Do we want to allow certain applications to be reused as 
Modules in other applications or should application package be different 
from Module package? The current approach is to distribute Modules as 
part of .jar for example as part of Malhar library without necessarily 
providing all necessary dependencies. Application package on other side 
must include all dependencies not provided by the platform.

3, 4. While this will help Module designer, it may complicate Module 
maintenance and how Modules are used. What if Module designer wants to 
change it's implementation and replace one operator implementation with 
another operator? Does StreamingApplication designer need to know 
internal structure of Modules? Should Module be considered as a black 
box during Application design time as it was initially planned?

5, 6, 7 +1. This is currently proposed behavior of Module functionality 
the way I understand it.

8. We need to see what Module designer can specify for partitioning. One 
of supported cases should be ability to specify cascading partitioning 
scheme.

Thank you,

Vlad

On 11/2/15 10:30, Pramod Immaneni wrote:
> I have some comments and suggestions on the module design. I think these
> need to be taken into account before we can merge the implementation
> provided below into the mainline code. I apologize if these should have
> been brought up earlier as for some reason or the other I was out of loop
> on this one
>
>      https://github.com/apache/incubator-apex-core/pull/148
> <https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963>
>
>      1. DAG scoping currently in the implementation is global for modules,
> each module's populateDAG sees the entire DAG. It should be locally scoped
> as one module does not and should not know about another.
>      2. The module has a populateDAG method with exact same syntax as in
> StreamingApplication. Is StreamingApplication also a module, should it
> extend that interface.
>      3. Setting properties for modules is too verbose. Module developer
> needs to repeat every property they want exposed with a setter and getter
> in JAVA. I don't disagree that module developer should be able to choose
> which properties from which operators need to be exposed but the current
> way seems to duplicate code. Here is a suggestion.
>           a. Allow modules to specify which operators and properties can be
> accessible from outside. One way is in the "populateDAG" method of the
> module when adding the operator have the ability to specify if this
> operator can be accessible from outside and which or all properties can be
> accessible.
>           b. Provide methods in ModuleMeta or elsewhere to set property
> values by specifying the operator name (friendly name) inside the module
> and property name. If this is allowed by a. above it is successful else it
> should fail.
>           c. Allow a syntax in property files to specify the property in b.
> Example syntax dt.module.<modulename>.operator.<operatorname>.prop.<
> propname>
>      4. For attributes same mechanism as in 3 should apply for the operators
> that are exposed by the module.  For property file, example syntax
> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>      5. Module developers in addition to 3. and 4. above may choose to
> support module level properties and attributes. These should not be the
> default when 3. and 4. are possible but complementary, in addition to them.
> In this case for properties they can implement setters and getters in the
> module. For attributes the user should still be able to set the attributes
> using the dag setAttribute method. You could introduce a method in the
> module to process attributes that can get called by the engine once
> everything is set.
>      6. For 5. above setting global properties and attributes for module is
> akin to ideas that have been proposed for the application as well. A
> consistent way must be possible for applications as well even if it is not
> implemented now.
>      7. For 5. or 6. above there should be a property file way of specifying
> the global module properties and attributes. Example syntax
> dt.module.<modulename>.prop.<propname>, dt.module.<modulename>.attr.<attrname>.
> Notice the difference with 3. c. and 4 above that there is no operator
> keyword here.
>      8. Partitioning needs to be consistent with what the user will expect
> when they see module as an entity. I will send an image of possible
> examples of how the user will expect the physical plan to look in certain
> cases.
>
> Thanks
>