You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Chetan Narsude <ch...@datatorrent.com> on 2015/09/04 00:59:39 UTC

Re: Stray folders under Malhar/lib [was Re: Vagrant folders under Malhar/lib]

Just changing the subject here.

--
Chetan


On Thu, Sep 3, 2015 at 3:35 PM, Thomas Weise <th...@datatorrent.com> wrote:

> Ram,
>
> Are you referring to running your app from the dtcli?
>
> That's one more item to check to not end up with stray files. There is
> nothing stopping us from using a different default for that use case. My
> take is that we should structure this in a way where by default unit tests
> have minimum things to configure and run as fast as reasonably possible.
>
> Thomas
>
> On Thu, Sep 3, 2015 at 3:19 PM, Munagala Ramanath <ra...@datatorrent.com>
> wrote:
>
> > It's not just unit tests.
> >
> > An app developer is likely to run a random app in LM to uncover bugs
> before
> > hitting the cluster.
> > The closer the LM setup is to the cluster setup (i.e. running as much of
> > the same code as reasonably possible)
> > the higher the probability that bugs will be hit in LM.
> >
> > Ram
> >
> > On Thu, Sep 3, 2015 at 3:13 PM, Chandni Singh <ch...@datatorrent.com>
> > wrote:
> >
> > > Yes Chetan, I am claiming that :-)
> > >
> > > I still don't understand the need for having two step checkpointing in
> > > LocalMode by default.
> > >
> > > StramLocalCluster should simplify test execution environment as pointed
> > out
> > > by Thomas.
> > > Async checkpoint should have its own test cases using StramLocalCluster
> > > that should not break when new features are added to the platform.
> > > But by default I think StramLocalCluster should use synchronous
> > > checkpointing.
> > >
> > > -Chandni
> > >
> > > On Thu, Sep 3, 2015 at 2:43 PM, Chetan Narsude <chetan@datatorrent.com
> >
> > > wrote:
> > >
> > > > Changing the storage agent is one of the ways to address the symptoms
> > of
> > > > the problem. But it's not treating the problem.
> > > >
> > > > In this case - change the basePath to a location under target and all
> > the
> > > > opinions are moot. And someone is claiming that we should not do it.
> > Not
> > > > sure why. Or is anyone claiming that anymore?
> > > >
> > > > --
> > > > Chetan
> > > >
> > > > On Thu, Sep 3, 2015 at 1:43 PM, Thomas Weise <thomas@datatorrent.com
> >
> > > > wrote:
> > > >
> > > >> Good point regarding the coverage. These JUnit tests are supposed to
> > > test
> > > >> individual components and all the tests collectively should strive
> to
> > > >> achieve high coverage. There are tests in Apex to cover storage
> > agents,
> > > >> recovery semantics etc.  Components that fall outside of the test
> > scope
> > > >> are
> > > >> reduced as much as possible through mocks (even though there is room
> > for
> > > >> improvement).
> > > >>
> > > >> The tests in Malhar are for operators and applications, not for the
> > > >> engine.
> > > >> In those cases where LM is used, the intention is to test the
> > > application
> > > >> functionality. It is expected that certain configurations are
> adjusted
> > > for
> > > >> the test and dependencies mocked.
> > > >>
> > > >> For the local mode, it should not be an issue to use a different
> > storage
> > > >> agent when it simplifies the test execution. Specifically, in this
> > case,
> > > >> we
> > > >> don't want to go and change many tests to make something work that
> > isn't
> > > >> needed. LM is not "production", it is not using HDFS and there are a
> > > >> number
> > > >> of other important differences that make it possible to run within
> the
> > > >> IDE.
> > > >>
> > > >> Instead, focus should be on those things that help with app
> coverage.
> > > For
> > > >> example, in the past we had seen issues with serialization of
> > operators
> > > >> that were not uncovered in LM, until we made the serialization part
> of
> > > the
> > > >> execution, even when it was not needed for execution.
> > > >>
> > > >> Thomas
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Sep 3, 2015 at 11:21 AM, Chetan Narsude <
> > chetan@datatorrent.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > I think Ram explained in a little more detail on what I am
> thinking.
> > > >> >
> > > >> > Tests are supposed to provide code coverage. Having localcluster
> is
> > > >> already
> > > >> > a variable, it's not what runs in production. Having a different
> > > storage
> > > >> > agent is another variable and it misses out on testing the
> > > asynchronous
> > > >> > flow. The gap keeps on increasing if we continue to do that.
> > > AsyncFSSA
> > > >> is
> > > >> > our default because it's supposed to do everything that
> > FSStorageAgent
> > > >> does
> > > >> > and some more. So not clear as to why the test which creates stray
> > > >> folders
> > > >> > is not configuring the storage agent properly instead of
> completely
> > > >> > changing it out which brings some other problems in as I just
> > > explained.
> > > >> >
> > > >> > If changing the storage agent is the only way to fix the problem
> > with
> > > >> > reasonable effort, then I would concede. I highly doubt that.
> > > >> >
> > > >> > --
> > > >> > Chetan
> > > >> >
> > > >> > On Thu, Sep 3, 2015 at 11:05 AM, Chandni Singh <
> > > chandni@datatorrent.com
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> > > The local mode was so far using FSStorageAgent which was used in
> > > >> > > production.
> > > >> > > In production using Async is needed because hdfs writes are slow
> > but
> > > >> is
> > > >> > > that the case with LocalMode?
> > > >> > >
> > > >> > > In local mode if we use Async we are creating checkpoints under
> > one
> > > >> local
> > > >> > > directory and then copying it to another local directory which
> > will
> > > >> not
> > > >> > > improve any performance.
> > > >> > >
> > > >> > > In my opinion StramLocalCluster use synchronous checkpointing as
> > > >> default.
> > > >> > >
> > > >> > > Chandni
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Sep 3, 2015 at 10:09 AM, Chetan Narsude <
> > > >> chetan@datatorrent.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > >> That sounds a lot like self contradicting reason; Let's make a
> > > change
> > > >> > >> because we don't want to make change. :-)
> > > >> > >>
> > > >> > >> The code is in certain state. This certain state is consistent
> > with
> > > >> how
> > > >> > >> things run in production. In test environment there is a
> problem
> > > that
> > > >> > stray
> > > >> > >> files are created. It's a small fix to relocate these files
> > > >> elsewhere.
> > > >> > What
> > > >> > >> I am trying to understand is that is not being done?
> > > >> > >>
> > > >> > >> --
> > > >> > >> Chetan
> > > >> > >>
> > > >> > >> On Thu, Sep 3, 2015 at 9:41 AM, Thomas Weise <
> > > thomas@datatorrent.com
> > > >> >
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >>> There is no need to configure anything extra with the proposed
> > > >> change,
> > > >> > it
> > > >> > >>> just brings back LM to how it worked before.
> > > >> > >>>
> > > >> > >>> There is no point modifying n tests for extra setup with no
> > gain.
> > > >> > >>>
> > > >> > >>> Thomas
> > > >> > >>>
> > > >> > >>> On Thu, Sep 3, 2015 at 9:14 AM, Chetan Narsude <
> > > >> chetan@datatorrent.com
> > > >> > >
> > > >> > >>> wrote:
> > > >> > >>>
> > > >> > >>> > Why does it matter that AsyncFSStorageAgent is being used
> with
> > > >> > >>> > LocalCluster? It using the localfs and hence no gain is the
> > > >> > >>> implementation
> > > >> > >>> > detail that's abstracted out by FileSystem already.
> > > >> > >>> >
> > > >> > >>> > If there is a problem of random artifacts left behind after
> > the
> > > >> test,
> > > >> > >>> there
> > > >> > >>> > is a reason and most likely it's misconfiguration of the
> > > >> > StorageAgent.
> > > >> > >>> Why
> > > >> > >>> > wouldn't that be fixed.
> > > >> > >>> >
> > > >> > >>> > --
> > > >> > >>> > Chetan
> > > >> > >>> >
> > > >> > >>> >
> > > >> > >>> > On Thu, Sep 3, 2015 at 8:59 AM, Amol Kekre <
> > > amol@datatorrent.com>
> > > >> > >>> wrote:
> > > >> > >>> >
> > > >> > >>> > > Clean up container files left over should be a distributed
> > OS
> > > >> task.
> > > >> > >>> Clean
> > > >> > >>> > > up, back up, archive, ... all is for the OS (aka YARN). We
> > > must
> > > >> > >>> assume
> > > >> > >>> > kill
> > > >> > >>> > > -9.
> > > >> > >>> > >
> > > >> > >>> > > The only thing where the operator comes into play is
> > > >> "teardown()",
> > > >> > >>> which
> > > >> > >>> > is
> > > >> > >>> > > business logic (not Apex engine) issue. This could be db
> > > >> connection
> > > >> > >>> etc.
> > > >> > >>> > >
> > > >> > >>> > > Thks,
> > > >> > >>> > > Amol
> > > >> > >>> > >
> > > >> > >>> > > On Thu, Sep 3, 2015 at 8:52 AM, Thomas Weise <
> > > >> > thomas@datatorrent.com
> > > >> > >>> >
> > > >> > >>> > > wrote:
> > > >> > >>> > >
> > > >> > >>> > > > When the container gets killed, we should not assume
> > > anything
> > > >> > about
> > > >> > >>> > > > cleanup. It can be a kill -9. Any related "cleanup"
> falls
> > > >> under
> > > >> > >>> nice to
> > > >> > >>> > > > have, no guarantees.
> > > >> > >>> > > >
> > > >> > >>> > > > On Thu, Sep 3, 2015 at 8:49 AM, Chandni Singh <
> > > >> > >>> chandni@datatorrent.com
> > > >> > >>> > >
> > > >> > >>> > > > wrote:
> > > >> > >>> > > >
> > > >> > >>> > > > > I have a question regarding what Gaurav mentioned
> > > >> > >>> > > > > ----
> > > >> > >>> > > > > When container runs in cluster, "." specifies the
> > > containers
> > > >> > >>> local
> > > >> > >>> > path
> > > >> > >>> > > > on
> > > >> > >>> > > > > the node where container specific jars and other
> > resources
> > > >> > >>> resides.
> > > >> > >>> > It
> > > >> > >>> > > > > creates a folder under that which is live as long as
> > > >> container
> > > >> > >>> lives.
> > > >> > >>> > > So
> > > >> > >>> > > > > there are no vagrant folders anywhere
> > > >> > >>> > > > > ---
> > > >> > >>> > > > >
> > > >> > >>> > > > > When the container gets killed, do we cleanup the
> > folders
> > > >> > >>> created by
> > > >> > >>> > > > Async
> > > >> > >>> > > > > under the containers working dir?
> > > >> > >>> > > > >
> > > >> > >>> > > > > On Thu, Sep 3, 2015 at 8:42 AM, Thomas Weise <
> > > >> > >>> thomas@datatorrent.com
> > > >> > >>> > >
> > > >> > >>> > > > > wrote:
> > > >> > >>> > > > >
> > > >> > >>> > > > >> It makes sense to use the synchronous checkpointing
> for
> > > the
> > > >> > >>> local
> > > >> > >>> > > mode.
> > > >> > >>> > > > >> LM is meant to simplify dependencies and setup. The
> > > default
> > > >> > for
> > > >> > >>> > > > execution
> > > >> > >>> > > > >> on YARN remains async.
> > > >> > >>> > > > >>
> > > >> > >>> > > > >> Thomas
> > > >> > >>> > > > >>
> > > >> > >>> > > > >>
> > > >> > >>> > > > >> On Thu, Sep 3, 2015 at 8:34 AM, Chandni Singh <
> > > >> > >>> > > chandni@datatorrent.com>
> > > >> > >>> > > > >> wrote:
> > > >> > >>> > > > >>
> > > >> > >>> > > > >>> APPLICATION_PATH isn't related to local base dir of
> > > Async
> > > >> as
> > > >> > >>> far
> > > >> > >>> > as I
> > > >> > >>> > > > >>> know. StramLocalCluster sets the APP_PATH to
> > > "target/...".
> > > >> > >>> > > > >>> StramLocalCluster should use FSStorageAgent.
> > > >> > >>> > > > >>>
> > > >> > >>> > > > >>> - Chandni
> > > >> > >>> > > > >>>
> > > >> > >>> > > > >>> On Thu, Sep 3, 2015 at 8:20 AM, Gaurav Gupta <
> > > >> > >>> > gaurav@datatorrent.com
> > > >> > >>> > > >
> > > >> > >>> > > > >>> wrote:
> > > >> > >>> > > > >>>
> > > >> > >>> > > > >>>> As Thomas mentioned as default remains to be async.
> > You
> > > >> can
> > > >> > >>> either
> > > >> > >>> > > > >>>> change the storage agent or set the
> APPLICATION_PATH.
> > > >> > >>> > > > >>>>
> > > >> > >>> > > > >>>> When container runs in cluster, "." specifies the
> > > >> containers
> > > >> > >>> local
> > > >> > >>> > > > path
> > > >> > >>> > > > >>>> on the node where container specific jars and other
> > > >> > resources
> > > >> > >>> > > > resides. It
> > > >> > >>> > > > >>>> creates a folder under that which is live as long
> as
> > > >> > container
> > > >> > >>> > > lives.
> > > >> > >>> > > > So
> > > >> > >>> > > > >>>> there are no vagrant folders anywhere
> > > >> > >>> > > > >>>>
> > > >> > >>> > > > >>>> Thanks
> > > >> > >>> > > > >>>> -Gaurav
> > > >> > >>> > > > >>>>
> > > >> > >>> > > > >>>> On Wed, Sep 2, 2015 at 11:33 PM, Chandni Singh <
> > > >> > >>> > > > chandni@datatorrent.com
> > > >> > >>> > > > >>>> > wrote:
> > > >> > >>> > > > >>>>
> > > >> > >>> > > > >>>>> I think there is a problem in the default Async as
> > > >> well. It
> > > >> > >>> also
> > > >> > >>> > > uses
> > > >> > >>> > > > >>>>> the working directory as its local base path.
> > > >> > >>> > > > >>>>>
> > > >> > >>> > > > >>>>> In the Async -> copyToHdfs()  method, we delete
> the
> > > >> window
> > > >> > >>> files
> > > >> > >>> > > but
> > > >> > >>> > > > >>>>> the folder with the operator name never gets
> > deleted.
> > > >> > >>> > > > >>>>> So on the cluster there  will be such vagrant
> > folders
> > > in
> > > >> > the
> > > >> > >>> > > working
> > > >> > >>> > > > >>>>> directory?
> > > >> > >>> > > > >>>>>
> > > >> > >>> > > > >>>>> On Wed, Sep 2, 2015 at 11:17 PM, Thomas Weise <
> > > >> > >>> > > > thomas@datatorrent.com>
> > > >> > >>> > > > >>>>> wrote:
> > > >> > >>> > > > >>>>>
> > > >> > >>> > > > >>>>>> Chandni,
> > > >> > >>> > > > >>>>>>
> > > >> > >>> > > > >>>>>> Agreed. See whether the tests work with the
> > > synchronous
> > > >> > >>> storage
> > > >> > >>> > > > >>>>>> agent. If yes, change them. The default needs to
> > > remain
> > > >> > >>> async.
> > > >> > >>> > > > >>>>>>
> > > >> > >>> > > > >>>>>> Thomas
> > > >> > >>> > > > >>>>>>
> > > >> > >>> > > > >>>>>>
> > > >> > >>> > > > >>>>>> On Wed, Sep 2, 2015 at 11:05 PM, Chandni Singh <
> > > >> > >>> > > > >>>>>> chandni@datatorrent.com> wrote:
> > > >> > >>> > > > >>>>>>
> > > >> > >>> > > > >>>>>>> Hi,
> > > >> > >>> > > > >>>>>>>
> > > >> > >>> > > > >>>>>>> I would like to know what was the reason to use
> > > >> > >>> > > AsyncFSStorageAgent
> > > >> > >>> > > > >>>>>>> with StramLocalCluster?
> > > >> > >>> > > > >>>>>>> StramLocalCluster is mainly for testing in a
> > > >> > >>> non-distributed
> > > >> > >>> > mode
> > > >> > >>> > > > >>>>>>> and I am unclear how AsyncFSStorageAgent is
> > helpful
> > > in
> > > >> > this
> > > >> > >>> > mode.
> > > >> > >>> > > > >>>>>>>
> > > >> > >>> > > > >>>>>>> Thanks,
> > > >> > >>> > > > >>>>>>> Chandni
> > > >> > >>> > > > >>>>>>>
> > > >> > >>> > > > >>>>>>> On Wed, Sep 2, 2015 at 10:45 PM, Chandni Singh <
> > > >> > >>> > > > >>>>>>> chandni@datatorrent.com> wrote:
> > > >> > >>> > > > >>>>>>>
> > > >> > >>> > > > >>>>>>>> This is because of recent changes to
> > > >> StramLocalCluster
> > > >> > >>> where
> > > >> > >>> > > > >>>>>>>> AsyncFSStorageAgent is used for checkpointing
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>> dag.setAttribute(OperatorContext.STORAGE_AGENT,
> > new
> > > >> > >>> > > > AsyncFSStorageAgent(new Path(pathUri,
> > > >> > >>> > > > LogicalPlan.SUBDIR_CHECKPOINTS).toString(), null));
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>> The AsyncFSStorageAgent(String path,
> > Configuration
> > > >> conf)
> > > >> > >>> uses
> > > >> > >>> > > "."
> > > >> > >>> > > > as localBasePath and therefore creates sub-directories
> per
> > > >> > >>> operator in
> > > >> > >>> > > the
> > > >> > >>> > > > current working directory.
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>> I am going to create a ticket to address this
> and
> > > >> will
> > > >> > >>> fix it.
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>> -Chandni
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>> On Wed, Sep 2, 2015 at 7:13 PM, Chandni Singh <
> > > >> > >>> > > > >>>>>>>> chandni@datatorrent.com> wrote:
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>>> Hi,
> > > >> > >>> > > > >>>>>>>>>
> > > >> > >>> > > > >>>>>>>>> I can see empty folders getting created under
> > > >> > Malhar/lib
> > > >> > >>> > called
> > > >> > >>> > > > >>>>>>>>> '1' and '2'.
> > > >> > >>> > > > >>>>>>>>> I think this is because of using LocalMode to
> > run
> > > a
> > > >> > test
> > > >> > >>> > > > >>>>>>>>> application.
> > > >> > >>> > > > >>>>>>>>>
> > > >> > >>> > > > >>>>>>>>>
> > > >> > >>> > > > >>>>>>>>> If anyone has checked in such cases please do
> > > check
> > > >> and
> > > >> > >>> let
> > > >> > >>> > us
> > > >> > >>> > > > >>>>>>>>> know.
> > > >> > >>> > > > >>>>>>>>>
> > > >> > >>> > > > >>>>>>>>> Thanks,
> > > >> > >>> > > > >>>>>>>>> Chandni
> > > >> > >>> > > > >>>>>>>>>
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>>
> > > >> > >>> > > > >>>>>>>
> > > >> > >>> > > > >>>>>>
> > > >> > >>> > > > >>>>>
> > > >> > >>> > > > >>>>
> > > >> > >>> > > > >>>
> > > >> > >>> > > > >>
> > > >> > >>> > > > >
> > > >> > >>> > > >
> > > >> > >>> > >
> > > >> > >>> >
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>