You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apex.apache.org by Aniruddha Thombare <an...@datatorrent.com> on 2016/02/01 15:05:30 UTC

Re: Possibility of saving checkpoints on other distributed filesystems

Hi Community,

Or Let me say BigFoots, do you think this feature should be available?

The reason to bring this up was discussed in the start of this thread as:

This is with the intention to recover the applications faster and do away
> with HDFS's small files problem as described here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> If we could save checkpoints in some other distributed file system (or
> even a HA NAS box) geared for small files, we could achieve -
>
>    - Better performance of NN & HDFS for the production usage (read:
>    production data I/O & not temp files)
>
>
>    - Faster application recovery in case of planned shutdown / unplanned
>    restarts
>
> If you feel the need of this feature, please cast your opinions and ideas
so that it can be converted in a jira.



Thanks,


Aniruddha

On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta <ga...@datatorrent.com>
wrote:

> Aniruddha,
>
> Currently we don't have any support for that.
>
> Thanks
> Gaurav
>
> Thanks
> -Gaurav
>
> On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi <tu...@datatorrent.com>
> wrote:
>
> > Default FSStorageAgent can be used as it can work with local filesystem,
> > but I far as I know there is no support for specifying the directory
> > through xml file. by default it use the application directory on HDFS.
> >
> > Not sure If we could specify storage agent with its properties through
> the
> > configuration at dag level.
> >
> > - Tushar.
> >
> >
> > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > aniruddha@datatorrent.com> wrote:
> >
> > > Hi,
> > >
> > > Do we have any storage agent which I can use readily, configurable
> > through
> > > dt-site.xml?
> > >
> > > I am looking for something which would save checkpoints in mounted file
> > > system [eg. HA-NAS] which is basically just another directory for Apex.
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Aniruddha
> > >
> > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> sandesh@datatorrent.com>
> > > wrote:
> > >
> > > > It is already supported refer the following jira for more
> information,
> > > >
> > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > >
> > > >
> > > >
> > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > aniruddha@datatorrent.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Is it possible to save checkpoints in any other highly available
> > > > > distributed file systems (which maybe mounted directories across
> the
> > > > > cluster) other than HDFS?
> > > > > If yes, is it configurable?
> > > > >
> > > > > AFAIK, there is no configurable option available to achieve that.
> > > > > If that's the case, can we have that feature?
> > > > >
> > > > > This is with the intention to recover the applications faster and
> do
> > > away
> > > > > with HDFS's small files problem as described here:
> > > > >
> > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > > > >
> > > > >
> > > >
> > >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
> > > > >
> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > >
> > > > > If we could save checkpoints in some other distributed file system
> > (or
> > > > even
> > > > > a HA NAS box) geared for small files, we could achieve -
> > > > >
> > > > >    - Better performance of NN & HDFS for the production usage
> (read:
> > > > >    production data I/O & not temp files)
> > > > >    - Faster application recovery in case of planned shutdown /
> > > unplanned
> > > > >    restarts
> > > > >
> > > > > Please, send your comments, suggestions or ideas.
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > Aniruddha
> > > > >
> > > >
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Amol Kekre <am...@datatorrent.com>.

This feature is ok as long as flag is false (do not remove files) by
default. One issue may be that RM may force kill the AM and not give it a
chance, so this feature may work only on graceful shutdown.

Thks
Amol


On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
wrote:

> I would prefer to have an additional argument during application launch on
> dtcli.
>
> Say, --preserve-kill-state true .
>
> Basically, platform should be able to do the clean-up activity if the
> application is invoked with certain flag.
>
> Test apps can set this flag to clear the data on kill. Production apps can
> set this flag to keep the data on kill.
>
> Shutdown should always preserve the state. But, for kill / forced-shutdown
> user might prefer to clear the state.
>
> ~ Yogi
>
> On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
>
>>
>> Can we include a script in our github (util?) that simply deletes these
>> files upon application being killed, given an app-id. The admin will need
>> to run this script. Auto-deleting will be bad as a lot of users, including
>> those in production today need to restart using those files. The
>> knowledge/desire to restart post failure is outside the app and hence
>> technically the script should be explicitly user invoked
>>
>> Thks,
>> Amol
>>
>>
>> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>
>>> Hi Venkat,
>>>
>>> There are typically a small number of outstanding checkpoint files per
>>> operator, as newer checkpoints are created old ones are automatically
>>> deleted by the application when it determines that state is no longer
>>> needed. When an application stops/killed the last checkpoints remain.
>>> There
>>> is also a benefit to that since a new application can be restarted to
>>> continue from those checkpoints instead of starting all the way from the
>>> beginning and this is useful in some cases. But if you are always
>>> starting
>>> your application from scratch yes you can delete the checkpoints of older
>>> applications that are no longer running.
>>>
>>> Thanks
>>>
>>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>>> VKottapalli@directv.com> wrote:
>>>
>>> > Hi,
>>> >
>>> >         Now that this has been discussed, Will the checkpointed data be
>>> > purged when we kill the application forcefully?  In our current usage,
>>> we
>>> > forcefully kill the app after it processes a certain batch of data. I
>>> see
>>> > these small files are created under (user/datatorrent) directory and
>>> not
>>> > removed.
>>> >
>>> >         Another scenario, when some of the containers keep failing, we
>>> > have observed this state where the data is continuously checkpointed
>>> into
>>> > small files. When we kill the app, the data will be there.
>>> >
>>> >         We have received concerns saying this is impacting namenode
>>> > performance since these small files are stored in HDFS. So we manually
>>> > remove these checkpointed data at regular intervals.
>>> >
>>> > -Venkatesh
>>> >
>>> > -----Original Message-----
>>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>>> > Sent: Monday, February 01, 2016 7:49 AM
>>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
>>> > Subject: Re: Possibility of saving checkpoints on other distributed
>>> > filesystems
>>> >
>>> > Aniruddha,
>>> > We have not heard this request from users yet. It may be because our
>>> > checkpointing has a purge, i.e. the small files are not left over.
>>> Small
>>> > file problem has been there in Hadoop and relates to storing small
>>> files in
>>> > Hadoop for a longer time (more likely forever).
>>> >
>>> > Thks,
>>> > Amol
>>> >
>>> >
>>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>>> > aniruddha@datatorrent.com> wrote:
>>> >
>>> > > Hi Community,
>>> > >
>>> > > Or Let me say BigFoots, do you think this feature should be
>>> available?
>>> > >
>>> > > The reason to bring this up was discussed in the start of this
>>> thread as:
>>> > >
>>> > > This is with the intention to recover the applications faster and do
>>> > > away
>>> > > > with HDFS's small files problem as described here:
>>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>>> > > >
>>> > > >
>>> > >
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > > l-files-problem/
>>> > > >
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > > > If we could save checkpoints in some other distributed file system
>>> > > > (or even a HA NAS box) geared for small files, we could achieve -
>>> > > >
>>> > > >    - Better performance of NN & HDFS for the production usage
>>> (read:
>>> > > >    production data I/O & not temp files)
>>> > > >
>>> > > >
>>> > > >    - Faster application recovery in case of planned shutdown /
>>> > unplanned
>>> > > >    restarts
>>> > > >
>>> > > > If you feel the need of this feature, please cast your opinions and
>>> > > > ideas
>>> > > so that it can be converted in a jira.
>>> > >
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > >
>>> > > Aniruddha
>>> > >
>>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>>> > > <ga...@datatorrent.com>
>>> > > wrote:
>>> > >
>>> > > > Aniruddha,
>>> > > >
>>> > > > Currently we don't have any support for that.
>>> > > >
>>> > > > Thanks
>>> > > > Gaurav
>>> > > >
>>> > > > Thanks
>>> > > > -Gaurav
>>> > > >
>>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>>> > > > <tu...@datatorrent.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Default FSStorageAgent can be used as it can work with local
>>> > > filesystem,
>>> > > > > but I far as I know there is no support for specifying the
>>> > > > > directory through xml file. by default it use the application
>>> > directory on HDFS.
>>> > > > >
>>> > > > > Not sure If we could specify storage agent with its properties
>>> > > > > through
>>> > > > the
>>> > > > > configuration at dag level.
>>> > > > >
>>> > > > > - Tushar.
>>> > > > >
>>> > > > >
>>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>>> > > > > aniruddha@datatorrent.com> wrote:
>>> > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > Do we have any storage agent which I can use readily,
>>> > > > > > configurable
>>> > > > > through
>>> > > > > > dt-site.xml?
>>> > > > > >
>>> > > > > > I am looking for something which would save checkpoints in
>>> > > > > > mounted
>>> > > file
>>> > > > > > system [eg. HA-NAS] which is basically just another directory
>>> > > > > > for
>>> > > Apex.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > >
>>> > > > > > Aniruddha
>>> > > > > >
>>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>>> > > > sandesh@datatorrent.com>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > It is already supported refer the following jira for more
>>> > > > information,
>>> > > > > > >
>>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>>> > > > > > > aniruddha@datatorrent.com> wrote:
>>> > > > > > >
>>> > > > > > > > Hi,
>>> > > > > > > >
>>> > > > > > > > Is it possible to save checkpoints in any other highly
>>> > > > > > > > available distributed file systems (which maybe mounted
>>> > > > > > > > directories across
>>> > > > the
>>> > > > > > > > cluster) other than HDFS?
>>> > > > > > > > If yes, is it configurable?
>>> > > > > > > >
>>> > > > > > > > AFAIK, there is no configurable option available to achieve
>>> > that.
>>> > > > > > > > If that's the case, can we have that feature?
>>> > > > > > > >
>>> > > > > > > > This is with the intention to recover the applications
>>> > > > > > > > faster and
>>> > > > do
>>> > > > > > away
>>> > > > > > > > with HDFS's small files problem as described here:
>>> > > > > > > >
>>> > > > > > > >
>>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>>> > > > > > > > m/
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > > l-files-problem/
>>> > > > > > > >
>>> > > >
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > > > > > > >
>>> > > > > > > > If we could save checkpoints in some other distributed file
>>> > > system
>>> > > > > (or
>>> > > > > > > even
>>> > > > > > > > a HA NAS box) geared for small files, we could achieve -
>>> > > > > > > >
>>> > > > > > > >    - Better performance of NN & HDFS for the production
>>> > > > > > > > usage
>>> > > > (read:
>>> > > > > > > >    production data I/O & not temp files)
>>> > > > > > > >    - Faster application recovery in case of planned
>>> shutdown
>>> > > > > > > > /
>>> > > > > > unplanned
>>> > > > > > > >    restarts
>>> > > > > > > >
>>> > > > > > > > Please, send your comments, suggestions or ideas.
>>> > > > > > > >
>>> > > > > > > > Thanks,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > Aniruddha
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Thomas Weise <th...@gmail.com>.

Functionality can additionally be made available as a CLI command. Just
need to ensure it works correctly with YARN application status and security.

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Munagala Ramanath <ra...@datatorrent.com>.

GW = GateWay

On Tue, Feb 2, 2016 at 10:37 AM, Sandesh Hegde <sa...@datatorrent.com>
wrote:

> What is GW?
>
> On Tue, Feb 2, 2016 at 9:16 AM Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
>> Good idea to handle it in GW.
>>
>> On Tue, Feb 2, 2016 at 8:50 AM, Thomas Weise <th...@datatorrent.com>
>> wrote:
>>
>>> Exactly, this doesn't make sense. I filed an enhancement to have this in
>>> GW
>>> a while ago.
>>>
>>> On Tue, Feb 2, 2016 at 8:48 AM, Pramod Immaneni <pr...@datatorrent.com>
>>> wrote:
>>>
>>> > Yogi,
>>> >
>>> > kill is not an orderly shutdown, who will clean the state?
>>> >
>>> > On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yogidevendra@apache.org
>>> >
>>> > wrote:
>>> >
>>> > > I would prefer to have an additional argument during application
>>> launch
>>> > on
>>> > > dtcli.
>>> > >
>>> > > Say, --preserve-kill-state true .
>>> > >
>>> > > Basically, platform should be able to do the clean-up activity if the
>>> > > application is invoked with certain flag.
>>> > >
>>> > > Test apps can set this flag to clear the data on kill. Production
>>> apps
>>> > can
>>> > > set this flag to keep the data on kill.
>>> > >
>>> > > Shutdown should always preserve the state. But, for kill /
>>> > forced-shutdown
>>> > > user might prefer to clear the state.
>>> > >
>>> > > ~ Yogi
>>> > >
>>> > > On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com>
>>> wrote:
>>> > >
>>> > >>
>>> > >> Can we include a script in our github (util?) that simply deletes
>>> these
>>> > >> files upon application being killed, given an app-id. The admin will
>>> > need
>>> > >> to run this script. Auto-deleting will be bad as a lot of users,
>>> > including
>>> > >> those in production today need to restart using those files. The
>>> > >> knowledge/desire to restart post failure is outside the app and
>>> hence
>>> > >> technically the script should be explicitly user invoked
>>> > >>
>>> > >> Thks,
>>> > >> Amol
>>> > >>
>>> > >>
>>> > >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <
>>> pramod@datatorrent.com
>>> > >
>>> > >> wrote:
>>> > >>
>>> > >>> Hi Venkat,
>>> > >>>
>>> > >>> There are typically a small number of outstanding checkpoint files
>>> per
>>> > >>> operator, as newer checkpoints are created old ones are
>>> automatically
>>> > >>> deleted by the application when it determines that state is no
>>> longer
>>> > >>> needed. When an application stops/killed the last checkpoints
>>> remain.
>>> > >>> There
>>> > >>> is also a benefit to that since a new application can be restarted
>>> to
>>> > >>> continue from those checkpoints instead of starting all the way
>>> from
>>> > the
>>> > >>> beginning and this is useful in some cases. But if you are always
>>> > >>> starting
>>> > >>> your application from scratch yes you can delete the checkpoints of
>>> > older
>>> > >>> applications that are no longer running.
>>> > >>>
>>> > >>> Thanks
>>> > >>>
>>> > >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>>> > >>> VKottapalli@directv.com> wrote:
>>> > >>>
>>> > >>> > Hi,
>>> > >>> >
>>> > >>> >         Now that this has been discussed, Will the checkpointed
>>> data
>>> > be
>>> > >>> > purged when we kill the application forcefully?  In our current
>>> > usage,
>>> > >>> we
>>> > >>> > forcefully kill the app after it processes a certain batch of
>>> data. I
>>> > >>> see
>>> > >>> > these small files are created under (user/datatorrent) directory
>>> and
>>> > >>> not
>>> > >>> > removed.
>>> > >>> >
>>> > >>> >         Another scenario, when some of the containers keep
>>> failing,
>>> > we
>>> > >>> > have observed this state where the data is continuously
>>> checkpointed
>>> > >>> into
>>> > >>> > small files. When we kill the app, the data will be there.
>>> > >>> >
>>> > >>> >         We have received concerns saying this is impacting
>>> namenode
>>> > >>> > performance since these small files are stored in HDFS. So we
>>> > manually
>>> > >>> > remove these checkpointed data at regular intervals.
>>> > >>> >
>>> > >>> > -Venkatesh
>>> > >>> >
>>> > >>> > -----Original Message-----
>>> > >>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>>> > >>> > Sent: Monday, February 01, 2016 7:49 AM
>>> > >>> > To: dev@apex.incubator.apache.org;
>>> users@apex.incubator.apache.org
>>> > >>> > Subject: Re: Possibility of saving checkpoints on other
>>> distributed
>>> > >>> > filesystems
>>> > >>> >
>>> > >>> > Aniruddha,
>>> > >>> > We have not heard this request from users yet. It may be because
>>> our
>>> > >>> > checkpointing has a purge, i.e. the small files are not left
>>> over.
>>> > >>> Small
>>> > >>> > file problem has been there in Hadoop and relates to storing
>>> small
>>> > >>> files in
>>> > >>> > Hadoop for a longer time (more likely forever).
>>> > >>> >
>>> > >>> > Thks,
>>> > >>> > Amol
>>> > >>> >
>>> > >>> >
>>> > >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>>> > >>> > aniruddha@datatorrent.com> wrote:
>>> > >>> >
>>> > >>> > > Hi Community,
>>> > >>> > >
>>> > >>> > > Or Let me say BigFoots, do you think this feature should be
>>> > >>> available?
>>> > >>> > >
>>> > >>> > > The reason to bring this up was discussed in the start of this
>>> > >>> thread as:
>>> > >>> > >
>>> > >>> > > This is with the intention to recover the applications faster
>>> and
>>> > do
>>> > >>> > > away
>>> > >>> > > > with HDFS's small files problem as described here:
>>> > >>> > > >
>>> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>>> > >>> > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>>
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > >>> > > l-files-problem/
>>> > >>> > > >
>>> > >>>
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > >>> > > > If we could save checkpoints in some other distributed file
>>> > system
>>> > >>> > > > (or even a HA NAS box) geared for small files, we could
>>> achieve -
>>> > >>> > > >
>>> > >>> > > >    - Better performance of NN & HDFS for the production usage
>>> > >>> (read:
>>> > >>> > > >    production data I/O & not temp files)
>>> > >>> > > >
>>> > >>> > > >
>>> > >>> > > >    - Faster application recovery in case of planned shutdown
>>> /
>>> > >>> > unplanned
>>> > >>> > > >    restarts
>>> > >>> > > >
>>> > >>> > > > If you feel the need of this feature, please cast your
>>> opinions
>>> > and
>>> > >>> > > > ideas
>>> > >>> > > so that it can be converted in a jira.
>>> > >>> > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > Thanks,
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > Aniruddha
>>> > >>> > >
>>> > >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>>> > >>> > > <ga...@datatorrent.com>
>>> > >>> > > wrote:
>>> > >>> > >
>>> > >>> > > > Aniruddha,
>>> > >>> > > >
>>> > >>> > > > Currently we don't have any support for that.
>>> > >>> > > >
>>> > >>> > > > Thanks
>>> > >>> > > > Gaurav
>>> > >>> > > >
>>> > >>> > > > Thanks
>>> > >>> > > > -Gaurav
>>> > >>> > > >
>>> > >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>>> > >>> > > > <tu...@datatorrent.com>
>>> > >>> > > > wrote:
>>> > >>> > > >
>>> > >>> > > > > Default FSStorageAgent can be used as it can work with
>>> local
>>> > >>> > > filesystem,
>>> > >>> > > > > but I far as I know there is no support for specifying the
>>> > >>> > > > > directory through xml file. by default it use the
>>> application
>>> > >>> > directory on HDFS.
>>> > >>> > > > >
>>> > >>> > > > > Not sure If we could specify storage agent with its
>>> properties
>>> > >>> > > > > through
>>> > >>> > > > the
>>> > >>> > > > > configuration at dag level.
>>> > >>> > > > >
>>> > >>> > > > > - Tushar.
>>> > >>> > > > >
>>> > >>> > > > >
>>> > >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>>> > >>> > > > > aniruddha@datatorrent.com> wrote:
>>> > >>> > > > >
>>> > >>> > > > > > Hi,
>>> > >>> > > > > >
>>> > >>> > > > > > Do we have any storage agent which I can use readily,
>>> > >>> > > > > > configurable
>>> > >>> > > > > through
>>> > >>> > > > > > dt-site.xml?
>>> > >>> > > > > >
>>> > >>> > > > > > I am looking for something which would save checkpoints
>>> in
>>> > >>> > > > > > mounted
>>> > >>> > > file
>>> > >>> > > > > > system [eg. HA-NAS] which is basically just another
>>> directory
>>> > >>> > > > > > for
>>> > >>> > > Apex.
>>> > >>> > > > > >
>>> > >>> > > > > >
>>> > >>> > > > > >
>>> > >>> > > > > >
>>> > >>> > > > > > Thanks,
>>> > >>> > > > > >
>>> > >>> > > > > >
>>> > >>> > > > > > Aniruddha
>>> > >>> > > > > >
>>> > >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>>> > >>> > > > sandesh@datatorrent.com>
>>> > >>> > > > > > wrote:
>>> > >>> > > > > >
>>> > >>> > > > > > > It is already supported refer the following jira for
>>> more
>>> > >>> > > > information,
>>> > >>> > > > > > >
>>> > >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>>> > >>> > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>>> > >>> > > > > > > aniruddha@datatorrent.com> wrote:
>>> > >>> > > > > > >
>>> > >>> > > > > > > > Hi,
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > Is it possible to save checkpoints in any other
>>> highly
>>> > >>> > > > > > > > available distributed file systems (which maybe
>>> mounted
>>> > >>> > > > > > > > directories across
>>> > >>> > > > the
>>> > >>> > > > > > > > cluster) other than HDFS?
>>> > >>> > > > > > > > If yes, is it configurable?
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > AFAIK, there is no configurable option available to
>>> > achieve
>>> > >>> > that.
>>> > >>> > > > > > > > If that's the case, can we have that feature?
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > This is with the intention to recover the
>>> applications
>>> > >>> > > > > > > > faster and
>>> > >>> > > > do
>>> > >>> > > > > > away
>>> > >>> > > > > > > > with HDFS's small files problem as described here:
>>> > >>> > > > > > > >
>>> > >>> > > > > > > >
>>> > >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>>> > >>> > > > > > > > m/
>>> > >>> > > > > > > >
>>> > >>> > > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>>
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > >>> > > l-files-problem/
>>> > >>> > > > > > > >
>>> > >>> > > >
>>> > >>>
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > If we could save checkpoints in some other
>>> distributed
>>> > file
>>> > >>> > > system
>>> > >>> > > > > (or
>>> > >>> > > > > > > even
>>> > >>> > > > > > > > a HA NAS box) geared for small files, we could
>>> achieve -
>>> > >>> > > > > > > >
>>> > >>> > > > > > > >    - Better performance of NN & HDFS for the
>>> production
>>> > >>> > > > > > > > usage
>>> > >>> > > > (read:
>>> > >>> > > > > > > >    production data I/O & not temp files)
>>> > >>> > > > > > > >    - Faster application recovery in case of planned
>>> > >>> shutdown
>>> > >>> > > > > > > > /
>>> > >>> > > > > > unplanned
>>> > >>> > > > > > > >    restarts
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > Please, send your comments, suggestions or ideas.
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > Thanks,
>>> > >>> > > > > > > >
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > Aniruddha
>>> > >>> > > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> > >>
>>> > >>
>>> > >
>>> >
>>>
>>
>>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Sandesh Hegde <sa...@datatorrent.com>.

What is GW?

On Tue, Feb 2, 2016 at 9:16 AM Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Good idea to handle it in GW.
>
> On Tue, Feb 2, 2016 at 8:50 AM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
>> Exactly, this doesn't make sense. I filed an enhancement to have this in
>> GW
>> a while ago.
>>
>> On Tue, Feb 2, 2016 at 8:48 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>
>> > Yogi,
>> >
>> > kill is not an orderly shutdown, who will clean the state?
>> >
>> > On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
>> > wrote:
>> >
>> > > I would prefer to have an additional argument during application
>> launch
>> > on
>> > > dtcli.
>> > >
>> > > Say, --preserve-kill-state true .
>> > >
>> > > Basically, platform should be able to do the clean-up activity if the
>> > > application is invoked with certain flag.
>> > >
>> > > Test apps can set this flag to clear the data on kill. Production apps
>> > can
>> > > set this flag to keep the data on kill.
>> > >
>> > > Shutdown should always preserve the state. But, for kill /
>> > forced-shutdown
>> > > user might prefer to clear the state.
>> > >
>> > > ~ Yogi
>> > >
>> > > On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
>> > >
>> > >>
>> > >> Can we include a script in our github (util?) that simply deletes
>> these
>> > >> files upon application being killed, given an app-id. The admin will
>> > need
>> > >> to run this script. Auto-deleting will be bad as a lot of users,
>> > including
>> > >> those in production today need to restart using those files. The
>> > >> knowledge/desire to restart post failure is outside the app and hence
>> > >> technically the script should be explicitly user invoked
>> > >>
>> > >> Thks,
>> > >> Amol
>> > >>
>> > >>
>> > >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <
>> pramod@datatorrent.com
>> > >
>> > >> wrote:
>> > >>
>> > >>> Hi Venkat,
>> > >>>
>> > >>> There are typically a small number of outstanding checkpoint files
>> per
>> > >>> operator, as newer checkpoints are created old ones are
>> automatically
>> > >>> deleted by the application when it determines that state is no
>> longer
>> > >>> needed. When an application stops/killed the last checkpoints
>> remain.
>> > >>> There
>> > >>> is also a benefit to that since a new application can be restarted
>> to
>> > >>> continue from those checkpoints instead of starting all the way from
>> > the
>> > >>> beginning and this is useful in some cases. But if you are always
>> > >>> starting
>> > >>> your application from scratch yes you can delete the checkpoints of
>> > older
>> > >>> applications that are no longer running.
>> > >>>
>> > >>> Thanks
>> > >>>
>> > >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>> > >>> VKottapalli@directv.com> wrote:
>> > >>>
>> > >>> > Hi,
>> > >>> >
>> > >>> >         Now that this has been discussed, Will the checkpointed
>> data
>> > be
>> > >>> > purged when we kill the application forcefully?  In our current
>> > usage,
>> > >>> we
>> > >>> > forcefully kill the app after it processes a certain batch of
>> data. I
>> > >>> see
>> > >>> > these small files are created under (user/datatorrent) directory
>> and
>> > >>> not
>> > >>> > removed.
>> > >>> >
>> > >>> >         Another scenario, when some of the containers keep
>> failing,
>> > we
>> > >>> > have observed this state where the data is continuously
>> checkpointed
>> > >>> into
>> > >>> > small files. When we kill the app, the data will be there.
>> > >>> >
>> > >>> >         We have received concerns saying this is impacting
>> namenode
>> > >>> > performance since these small files are stored in HDFS. So we
>> > manually
>> > >>> > remove these checkpointed data at regular intervals.
>> > >>> >
>> > >>> > -Venkatesh
>> > >>> >
>> > >>> > -----Original Message-----
>> > >>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>> > >>> > Sent: Monday, February 01, 2016 7:49 AM
>> > >>> > To: dev@apex.incubator.apache.org;
>> users@apex.incubator.apache.org
>> > >>> > Subject: Re: Possibility of saving checkpoints on other
>> distributed
>> > >>> > filesystems
>> > >>> >
>> > >>> > Aniruddha,
>> > >>> > We have not heard this request from users yet. It may be because
>> our
>> > >>> > checkpointing has a purge, i.e. the small files are not left over.
>> > >>> Small
>> > >>> > file problem has been there in Hadoop and relates to storing small
>> > >>> files in
>> > >>> > Hadoop for a longer time (more likely forever).
>> > >>> >
>> > >>> > Thks,
>> > >>> > Amol
>> > >>> >
>> > >>> >
>> > >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>> > >>> > aniruddha@datatorrent.com> wrote:
>> > >>> >
>> > >>> > > Hi Community,
>> > >>> > >
>> > >>> > > Or Let me say BigFoots, do you think this feature should be
>> > >>> available?
>> > >>> > >
>> > >>> > > The reason to bring this up was discussed in the start of this
>> > >>> thread as:
>> > >>> > >
>> > >>> > > This is with the intention to recover the applications faster
>> and
>> > do
>> > >>> > > away
>> > >>> > > > with HDFS's small files problem as described here:
>> > >>> > > >
>> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>> > >>> > > >
>> > >>> > > >
>> > >>> > >
>> > >>>
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > >>> > > l-files-problem/
>> > >>> > > >
>> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > >>> > > > If we could save checkpoints in some other distributed file
>> > system
>> > >>> > > > (or even a HA NAS box) geared for small files, we could
>> achieve -
>> > >>> > > >
>> > >>> > > >    - Better performance of NN & HDFS for the production usage
>> > >>> (read:
>> > >>> > > >    production data I/O & not temp files)
>> > >>> > > >
>> > >>> > > >
>> > >>> > > >    - Faster application recovery in case of planned shutdown /
>> > >>> > unplanned
>> > >>> > > >    restarts
>> > >>> > > >
>> > >>> > > > If you feel the need of this feature, please cast your
>> opinions
>> > and
>> > >>> > > > ideas
>> > >>> > > so that it can be converted in a jira.
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > Thanks,
>> > >>> > >
>> > >>> > >
>> > >>> > > Aniruddha
>> > >>> > >
>> > >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>> > >>> > > <ga...@datatorrent.com>
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Aniruddha,
>> > >>> > > >
>> > >>> > > > Currently we don't have any support for that.
>> > >>> > > >
>> > >>> > > > Thanks
>> > >>> > > > Gaurav
>> > >>> > > >
>> > >>> > > > Thanks
>> > >>> > > > -Gaurav
>> > >>> > > >
>> > >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>> > >>> > > > <tu...@datatorrent.com>
>> > >>> > > > wrote:
>> > >>> > > >
>> > >>> > > > > Default FSStorageAgent can be used as it can work with local
>> > >>> > > filesystem,
>> > >>> > > > > but I far as I know there is no support for specifying the
>> > >>> > > > > directory through xml file. by default it use the
>> application
>> > >>> > directory on HDFS.
>> > >>> > > > >
>> > >>> > > > > Not sure If we could specify storage agent with its
>> properties
>> > >>> > > > > through
>> > >>> > > > the
>> > >>> > > > > configuration at dag level.
>> > >>> > > > >
>> > >>> > > > > - Tushar.
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>> > >>> > > > > aniruddha@datatorrent.com> wrote:
>> > >>> > > > >
>> > >>> > > > > > Hi,
>> > >>> > > > > >
>> > >>> > > > > > Do we have any storage agent which I can use readily,
>> > >>> > > > > > configurable
>> > >>> > > > > through
>> > >>> > > > > > dt-site.xml?
>> > >>> > > > > >
>> > >>> > > > > > I am looking for something which would save checkpoints in
>> > >>> > > > > > mounted
>> > >>> > > file
>> > >>> > > > > > system [eg. HA-NAS] which is basically just another
>> directory
>> > >>> > > > > > for
>> > >>> > > Apex.
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > > Thanks,
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > > Aniruddha
>> > >>> > > > > >
>> > >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>> > >>> > > > sandesh@datatorrent.com>
>> > >>> > > > > > wrote:
>> > >>> > > > > >
>> > >>> > > > > > > It is already supported refer the following jira for
>> more
>> > >>> > > > information,
>> > >>> > > > > > >
>> > >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>> > >>> > > > > > > aniruddha@datatorrent.com> wrote:
>> > >>> > > > > > >
>> > >>> > > > > > > > Hi,
>> > >>> > > > > > > >
>> > >>> > > > > > > > Is it possible to save checkpoints in any other highly
>> > >>> > > > > > > > available distributed file systems (which maybe
>> mounted
>> > >>> > > > > > > > directories across
>> > >>> > > > the
>> > >>> > > > > > > > cluster) other than HDFS?
>> > >>> > > > > > > > If yes, is it configurable?
>> > >>> > > > > > > >
>> > >>> > > > > > > > AFAIK, there is no configurable option available to
>> > achieve
>> > >>> > that.
>> > >>> > > > > > > > If that's the case, can we have that feature?
>> > >>> > > > > > > >
>> > >>> > > > > > > > This is with the intention to recover the applications
>> > >>> > > > > > > > faster and
>> > >>> > > > do
>> > >>> > > > > > away
>> > >>> > > > > > > > with HDFS's small files problem as described here:
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>> > >>> > > > > > > > m/
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>>
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > >>> > > l-files-problem/
>> > >>> > > > > > > >
>> > >>> > > >
>> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > >>> > > > > > > >
>> > >>> > > > > > > > If we could save checkpoints in some other distributed
>> > file
>> > >>> > > system
>> > >>> > > > > (or
>> > >>> > > > > > > even
>> > >>> > > > > > > > a HA NAS box) geared for small files, we could
>> achieve -
>> > >>> > > > > > > >
>> > >>> > > > > > > >    - Better performance of NN & HDFS for the
>> production
>> > >>> > > > > > > > usage
>> > >>> > > > (read:
>> > >>> > > > > > > >    production data I/O & not temp files)
>> > >>> > > > > > > >    - Faster application recovery in case of planned
>> > >>> shutdown
>> > >>> > > > > > > > /
>> > >>> > > > > > unplanned
>> > >>> > > > > > > >    restarts
>> > >>> > > > > > > >
>> > >>> > > > > > > > Please, send your comments, suggestions or ideas.
>> > >>> > > > > > > >
>> > >>> > > > > > > > Thanks,
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > > > Aniruddha
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Sandesh Hegde <sa...@datatorrent.com>.

What is GW?

On Tue, Feb 2, 2016 at 9:16 AM Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Good idea to handle it in GW.
>
> On Tue, Feb 2, 2016 at 8:50 AM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
>> Exactly, this doesn't make sense. I filed an enhancement to have this in
>> GW
>> a while ago.
>>
>> On Tue, Feb 2, 2016 at 8:48 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>
>> > Yogi,
>> >
>> > kill is not an orderly shutdown, who will clean the state?
>> >
>> > On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
>> > wrote:
>> >
>> > > I would prefer to have an additional argument during application
>> launch
>> > on
>> > > dtcli.
>> > >
>> > > Say, --preserve-kill-state true .
>> > >
>> > > Basically, platform should be able to do the clean-up activity if the
>> > > application is invoked with certain flag.
>> > >
>> > > Test apps can set this flag to clear the data on kill. Production apps
>> > can
>> > > set this flag to keep the data on kill.
>> > >
>> > > Shutdown should always preserve the state. But, for kill /
>> > forced-shutdown
>> > > user might prefer to clear the state.
>> > >
>> > > ~ Yogi
>> > >
>> > > On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
>> > >
>> > >>
>> > >> Can we include a script in our github (util?) that simply deletes
>> these
>> > >> files upon application being killed, given an app-id. The admin will
>> > need
>> > >> to run this script. Auto-deleting will be bad as a lot of users,
>> > including
>> > >> those in production today need to restart using those files. The
>> > >> knowledge/desire to restart post failure is outside the app and hence
>> > >> technically the script should be explicitly user invoked
>> > >>
>> > >> Thks,
>> > >> Amol
>> > >>
>> > >>
>> > >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <
>> pramod@datatorrent.com
>> > >
>> > >> wrote:
>> > >>
>> > >>> Hi Venkat,
>> > >>>
>> > >>> There are typically a small number of outstanding checkpoint files
>> per
>> > >>> operator, as newer checkpoints are created old ones are
>> automatically
>> > >>> deleted by the application when it determines that state is no
>> longer
>> > >>> needed. When an application stops/killed the last checkpoints
>> remain.
>> > >>> There
>> > >>> is also a benefit to that since a new application can be restarted
>> to
>> > >>> continue from those checkpoints instead of starting all the way from
>> > the
>> > >>> beginning and this is useful in some cases. But if you are always
>> > >>> starting
>> > >>> your application from scratch yes you can delete the checkpoints of
>> > older
>> > >>> applications that are no longer running.
>> > >>>
>> > >>> Thanks
>> > >>>
>> > >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>> > >>> VKottapalli@directv.com> wrote:
>> > >>>
>> > >>> > Hi,
>> > >>> >
>> > >>> >         Now that this has been discussed, Will the checkpointed
>> data
>> > be
>> > >>> > purged when we kill the application forcefully?  In our current
>> > usage,
>> > >>> we
>> > >>> > forcefully kill the app after it processes a certain batch of
>> data. I
>> > >>> see
>> > >>> > these small files are created under (user/datatorrent) directory
>> and
>> > >>> not
>> > >>> > removed.
>> > >>> >
>> > >>> >         Another scenario, when some of the containers keep
>> failing,
>> > we
>> > >>> > have observed this state where the data is continuously
>> checkpointed
>> > >>> into
>> > >>> > small files. When we kill the app, the data will be there.
>> > >>> >
>> > >>> >         We have received concerns saying this is impacting
>> namenode
>> > >>> > performance since these small files are stored in HDFS. So we
>> > manually
>> > >>> > remove these checkpointed data at regular intervals.
>> > >>> >
>> > >>> > -Venkatesh
>> > >>> >
>> > >>> > -----Original Message-----
>> > >>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>> > >>> > Sent: Monday, February 01, 2016 7:49 AM
>> > >>> > To: dev@apex.incubator.apache.org;
>> users@apex.incubator.apache.org
>> > >>> > Subject: Re: Possibility of saving checkpoints on other
>> distributed
>> > >>> > filesystems
>> > >>> >
>> > >>> > Aniruddha,
>> > >>> > We have not heard this request from users yet. It may be because
>> our
>> > >>> > checkpointing has a purge, i.e. the small files are not left over.
>> > >>> Small
>> > >>> > file problem has been there in Hadoop and relates to storing small
>> > >>> files in
>> > >>> > Hadoop for a longer time (more likely forever).
>> > >>> >
>> > >>> > Thks,
>> > >>> > Amol
>> > >>> >
>> > >>> >
>> > >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>> > >>> > aniruddha@datatorrent.com> wrote:
>> > >>> >
>> > >>> > > Hi Community,
>> > >>> > >
>> > >>> > > Or Let me say BigFoots, do you think this feature should be
>> > >>> available?
>> > >>> > >
>> > >>> > > The reason to bring this up was discussed in the start of this
>> > >>> thread as:
>> > >>> > >
>> > >>> > > This is with the intention to recover the applications faster
>> and
>> > do
>> > >>> > > away
>> > >>> > > > with HDFS's small files problem as described here:
>> > >>> > > >
>> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>> > >>> > > >
>> > >>> > > >
>> > >>> > >
>> > >>>
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > >>> > > l-files-problem/
>> > >>> > > >
>> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > >>> > > > If we could save checkpoints in some other distributed file
>> > system
>> > >>> > > > (or even a HA NAS box) geared for small files, we could
>> achieve -
>> > >>> > > >
>> > >>> > > >    - Better performance of NN & HDFS for the production usage
>> > >>> (read:
>> > >>> > > >    production data I/O & not temp files)
>> > >>> > > >
>> > >>> > > >
>> > >>> > > >    - Faster application recovery in case of planned shutdown /
>> > >>> > unplanned
>> > >>> > > >    restarts
>> > >>> > > >
>> > >>> > > > If you feel the need of this feature, please cast your
>> opinions
>> > and
>> > >>> > > > ideas
>> > >>> > > so that it can be converted in a jira.
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > Thanks,
>> > >>> > >
>> > >>> > >
>> > >>> > > Aniruddha
>> > >>> > >
>> > >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>> > >>> > > <ga...@datatorrent.com>
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Aniruddha,
>> > >>> > > >
>> > >>> > > > Currently we don't have any support for that.
>> > >>> > > >
>> > >>> > > > Thanks
>> > >>> > > > Gaurav
>> > >>> > > >
>> > >>> > > > Thanks
>> > >>> > > > -Gaurav
>> > >>> > > >
>> > >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>> > >>> > > > <tu...@datatorrent.com>
>> > >>> > > > wrote:
>> > >>> > > >
>> > >>> > > > > Default FSStorageAgent can be used as it can work with local
>> > >>> > > filesystem,
>> > >>> > > > > but I far as I know there is no support for specifying the
>> > >>> > > > > directory through xml file. by default it use the
>> application
>> > >>> > directory on HDFS.
>> > >>> > > > >
>> > >>> > > > > Not sure If we could specify storage agent with its
>> properties
>> > >>> > > > > through
>> > >>> > > > the
>> > >>> > > > > configuration at dag level.
>> > >>> > > > >
>> > >>> > > > > - Tushar.
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>> > >>> > > > > aniruddha@datatorrent.com> wrote:
>> > >>> > > > >
>> > >>> > > > > > Hi,
>> > >>> > > > > >
>> > >>> > > > > > Do we have any storage agent which I can use readily,
>> > >>> > > > > > configurable
>> > >>> > > > > through
>> > >>> > > > > > dt-site.xml?
>> > >>> > > > > >
>> > >>> > > > > > I am looking for something which would save checkpoints in
>> > >>> > > > > > mounted
>> > >>> > > file
>> > >>> > > > > > system [eg. HA-NAS] which is basically just another
>> directory
>> > >>> > > > > > for
>> > >>> > > Apex.
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > > Thanks,
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > > Aniruddha
>> > >>> > > > > >
>> > >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>> > >>> > > > sandesh@datatorrent.com>
>> > >>> > > > > > wrote:
>> > >>> > > > > >
>> > >>> > > > > > > It is already supported refer the following jira for
>> more
>> > >>> > > > information,
>> > >>> > > > > > >
>> > >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>> > >>> > > > > > > aniruddha@datatorrent.com> wrote:
>> > >>> > > > > > >
>> > >>> > > > > > > > Hi,
>> > >>> > > > > > > >
>> > >>> > > > > > > > Is it possible to save checkpoints in any other highly
>> > >>> > > > > > > > available distributed file systems (which maybe
>> mounted
>> > >>> > > > > > > > directories across
>> > >>> > > > the
>> > >>> > > > > > > > cluster) other than HDFS?
>> > >>> > > > > > > > If yes, is it configurable?
>> > >>> > > > > > > >
>> > >>> > > > > > > > AFAIK, there is no configurable option available to
>> > achieve
>> > >>> > that.
>> > >>> > > > > > > > If that's the case, can we have that feature?
>> > >>> > > > > > > >
>> > >>> > > > > > > > This is with the intention to recover the applications
>> > >>> > > > > > > > faster and
>> > >>> > > > do
>> > >>> > > > > > away
>> > >>> > > > > > > > with HDFS's small files problem as described here:
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>> > >>> > > > > > > > m/
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>>
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > >>> > > l-files-problem/
>> > >>> > > > > > > >
>> > >>> > > >
>> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > >>> > > > > > > >
>> > >>> > > > > > > > If we could save checkpoints in some other distributed
>> > file
>> > >>> > > system
>> > >>> > > > > (or
>> > >>> > > > > > > even
>> > >>> > > > > > > > a HA NAS box) geared for small files, we could
>> achieve -
>> > >>> > > > > > > >
>> > >>> > > > > > > >    - Better performance of NN & HDFS for the
>> production
>> > >>> > > > > > > > usage
>> > >>> > > > (read:
>> > >>> > > > > > > >    production data I/O & not temp files)
>> > >>> > > > > > > >    - Faster application recovery in case of planned
>> > >>> shutdown
>> > >>> > > > > > > > /
>> > >>> > > > > > unplanned
>> > >>> > > > > > > >    restarts
>> > >>> > > > > > > >
>> > >>> > > > > > > > Please, send your comments, suggestions or ideas.
>> > >>> > > > > > > >
>> > >>> > > > > > > > Thanks,
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > > > Aniruddha
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Thomas Weise <th...@gmail.com>.

Functionality can additionally be made available as a CLI command. Just
need to ensure it works correctly with YARN application status and security.

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Good idea to handle it in GW.

On Tue, Feb 2, 2016 at 8:50 AM, Thomas Weise <th...@datatorrent.com> wrote:

> Exactly, this doesn't make sense. I filed an enhancement to have this in GW
> a while ago.
>
> On Tue, Feb 2, 2016 at 8:48 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
> > Yogi,
> >
> > kill is not an orderly shutdown, who will clean the state?
> >
> > On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
> > wrote:
> >
> > > I would prefer to have an additional argument during application launch
> > on
> > > dtcli.
> > >
> > > Say, --preserve-kill-state true .
> > >
> > > Basically, platform should be able to do the clean-up activity if the
> > > application is invoked with certain flag.
> > >
> > > Test apps can set this flag to clear the data on kill. Production apps
> > can
> > > set this flag to keep the data on kill.
> > >
> > > Shutdown should always preserve the state. But, for kill /
> > forced-shutdown
> > > user might prefer to clear the state.
> > >
> > > ~ Yogi
> > >
> > > On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
> > >
> > >>
> > >> Can we include a script in our github (util?) that simply deletes
> these
> > >> files upon application being killed, given an app-id. The admin will
> > need
> > >> to run this script. Auto-deleting will be bad as a lot of users,
> > including
> > >> those in production today need to restart using those files. The
> > >> knowledge/desire to restart post failure is outside the app and hence
> > >> technically the script should be explicitly user invoked
> > >>
> > >> Thks,
> > >> Amol
> > >>
> > >>
> > >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <
> pramod@datatorrent.com
> > >
> > >> wrote:
> > >>
> > >>> Hi Venkat,
> > >>>
> > >>> There are typically a small number of outstanding checkpoint files
> per
> > >>> operator, as newer checkpoints are created old ones are automatically
> > >>> deleted by the application when it determines that state is no longer
> > >>> needed. When an application stops/killed the last checkpoints remain.
> > >>> There
> > >>> is also a benefit to that since a new application can be restarted to
> > >>> continue from those checkpoints instead of starting all the way from
> > the
> > >>> beginning and this is useful in some cases. But if you are always
> > >>> starting
> > >>> your application from scratch yes you can delete the checkpoints of
> > older
> > >>> applications that are no longer running.
> > >>>
> > >>> Thanks
> > >>>
> > >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
> > >>> VKottapalli@directv.com> wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> >         Now that this has been discussed, Will the checkpointed
> data
> > be
> > >>> > purged when we kill the application forcefully?  In our current
> > usage,
> > >>> we
> > >>> > forcefully kill the app after it processes a certain batch of
> data. I
> > >>> see
> > >>> > these small files are created under (user/datatorrent) directory
> and
> > >>> not
> > >>> > removed.
> > >>> >
> > >>> >         Another scenario, when some of the containers keep failing,
> > we
> > >>> > have observed this state where the data is continuously
> checkpointed
> > >>> into
> > >>> > small files. When we kill the app, the data will be there.
> > >>> >
> > >>> >         We have received concerns saying this is impacting namenode
> > >>> > performance since these small files are stored in HDFS. So we
> > manually
> > >>> > remove these checkpointed data at regular intervals.
> > >>> >
> > >>> > -Venkatesh
> > >>> >
> > >>> > -----Original Message-----
> > >>> > From: Amol Kekre [mailto:amol@datatorrent.com]
> > >>> > Sent: Monday, February 01, 2016 7:49 AM
> > >>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
> > >>> > Subject: Re: Possibility of saving checkpoints on other distributed
> > >>> > filesystems
> > >>> >
> > >>> > Aniruddha,
> > >>> > We have not heard this request from users yet. It may be because
> our
> > >>> > checkpointing has a purge, i.e. the small files are not left over.
> > >>> Small
> > >>> > file problem has been there in Hadoop and relates to storing small
> > >>> files in
> > >>> > Hadoop for a longer time (more likely forever).
> > >>> >
> > >>> > Thks,
> > >>> > Amol
> > >>> >
> > >>> >
> > >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> > >>> > aniruddha@datatorrent.com> wrote:
> > >>> >
> > >>> > > Hi Community,
> > >>> > >
> > >>> > > Or Let me say BigFoots, do you think this feature should be
> > >>> available?
> > >>> > >
> > >>> > > The reason to bring this up was discussed in the start of this
> > >>> thread as:
> > >>> > >
> > >>> > > This is with the intention to recover the applications faster and
> > do
> > >>> > > away
> > >>> > > > with HDFS's small files problem as described here:
> > >>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > >>> > > >
> > >>> > > >
> > >>> > >
> > >>>
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > >>> > > l-files-problem/
> > >>> > > >
> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > >>> > > > If we could save checkpoints in some other distributed file
> > system
> > >>> > > > (or even a HA NAS box) geared for small files, we could
> achieve -
> > >>> > > >
> > >>> > > >    - Better performance of NN & HDFS for the production usage
> > >>> (read:
> > >>> > > >    production data I/O & not temp files)
> > >>> > > >
> > >>> > > >
> > >>> > > >    - Faster application recovery in case of planned shutdown /
> > >>> > unplanned
> > >>> > > >    restarts
> > >>> > > >
> > >>> > > > If you feel the need of this feature, please cast your opinions
> > and
> > >>> > > > ideas
> > >>> > > so that it can be converted in a jira.
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > Thanks,
> > >>> > >
> > >>> > >
> > >>> > > Aniruddha
> > >>> > >
> > >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> > >>> > > <ga...@datatorrent.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Aniruddha,
> > >>> > > >
> > >>> > > > Currently we don't have any support for that.
> > >>> > > >
> > >>> > > > Thanks
> > >>> > > > Gaurav
> > >>> > > >
> > >>> > > > Thanks
> > >>> > > > -Gaurav
> > >>> > > >
> > >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> > >>> > > > <tu...@datatorrent.com>
> > >>> > > > wrote:
> > >>> > > >
> > >>> > > > > Default FSStorageAgent can be used as it can work with local
> > >>> > > filesystem,
> > >>> > > > > but I far as I know there is no support for specifying the
> > >>> > > > > directory through xml file. by default it use the application
> > >>> > directory on HDFS.
> > >>> > > > >
> > >>> > > > > Not sure If we could specify storage agent with its
> properties
> > >>> > > > > through
> > >>> > > > the
> > >>> > > > > configuration at dag level.
> > >>> > > > >
> > >>> > > > > - Tushar.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > >>> > > > > aniruddha@datatorrent.com> wrote:
> > >>> > > > >
> > >>> > > > > > Hi,
> > >>> > > > > >
> > >>> > > > > > Do we have any storage agent which I can use readily,
> > >>> > > > > > configurable
> > >>> > > > > through
> > >>> > > > > > dt-site.xml?
> > >>> > > > > >
> > >>> > > > > > I am looking for something which would save checkpoints in
> > >>> > > > > > mounted
> > >>> > > file
> > >>> > > > > > system [eg. HA-NAS] which is basically just another
> directory
> > >>> > > > > > for
> > >>> > > Apex.
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > Thanks,
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > Aniruddha
> > >>> > > > > >
> > >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > >>> > > > sandesh@datatorrent.com>
> > >>> > > > > > wrote:
> > >>> > > > > >
> > >>> > > > > > > It is already supported refer the following jira for more
> > >>> > > > information,
> > >>> > > > > > >
> > >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > >>> > > > > > > aniruddha@datatorrent.com> wrote:
> > >>> > > > > > >
> > >>> > > > > > > > Hi,
> > >>> > > > > > > >
> > >>> > > > > > > > Is it possible to save checkpoints in any other highly
> > >>> > > > > > > > available distributed file systems (which maybe mounted
> > >>> > > > > > > > directories across
> > >>> > > > the
> > >>> > > > > > > > cluster) other than HDFS?
> > >>> > > > > > > > If yes, is it configurable?
> > >>> > > > > > > >
> > >>> > > > > > > > AFAIK, there is no configurable option available to
> > achieve
> > >>> > that.
> > >>> > > > > > > > If that's the case, can we have that feature?
> > >>> > > > > > > >
> > >>> > > > > > > > This is with the intention to recover the applications
> > >>> > > > > > > > faster and
> > >>> > > > do
> > >>> > > > > > away
> > >>> > > > > > > > with HDFS's small files problem as described here:
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > >>> > > > > > > > m/
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>>
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > >>> > > l-files-problem/
> > >>> > > > > > > >
> > >>> > > >
> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > >>> > > > > > > >
> > >>> > > > > > > > If we could save checkpoints in some other distributed
> > file
> > >>> > > system
> > >>> > > > > (or
> > >>> > > > > > > even
> > >>> > > > > > > > a HA NAS box) geared for small files, we could achieve
> -
> > >>> > > > > > > >
> > >>> > > > > > > >    - Better performance of NN & HDFS for the production
> > >>> > > > > > > > usage
> > >>> > > > (read:
> > >>> > > > > > > >    production data I/O & not temp files)
> > >>> > > > > > > >    - Faster application recovery in case of planned
> > >>> shutdown
> > >>> > > > > > > > /
> > >>> > > > > > unplanned
> > >>> > > > > > > >    restarts
> > >>> > > > > > > >
> > >>> > > > > > > > Please, send your comments, suggestions or ideas.
> > >>> > > > > > > >
> > >>> > > > > > > > Thanks,
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Aniruddha
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Good idea to handle it in GW.

On Tue, Feb 2, 2016 at 8:50 AM, Thomas Weise <th...@datatorrent.com> wrote:

> Exactly, this doesn't make sense. I filed an enhancement to have this in GW
> a while ago.
>
> On Tue, Feb 2, 2016 at 8:48 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
> > Yogi,
> >
> > kill is not an orderly shutdown, who will clean the state?
> >
> > On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
> > wrote:
> >
> > > I would prefer to have an additional argument during application launch
> > on
> > > dtcli.
> > >
> > > Say, --preserve-kill-state true .
> > >
> > > Basically, platform should be able to do the clean-up activity if the
> > > application is invoked with certain flag.
> > >
> > > Test apps can set this flag to clear the data on kill. Production apps
> > can
> > > set this flag to keep the data on kill.
> > >
> > > Shutdown should always preserve the state. But, for kill /
> > forced-shutdown
> > > user might prefer to clear the state.
> > >
> > > ~ Yogi
> > >
> > > On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
> > >
> > >>
> > >> Can we include a script in our github (util?) that simply deletes
> these
> > >> files upon application being killed, given an app-id. The admin will
> > need
> > >> to run this script. Auto-deleting will be bad as a lot of users,
> > including
> > >> those in production today need to restart using those files. The
> > >> knowledge/desire to restart post failure is outside the app and hence
> > >> technically the script should be explicitly user invoked
> > >>
> > >> Thks,
> > >> Amol
> > >>
> > >>
> > >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <
> pramod@datatorrent.com
> > >
> > >> wrote:
> > >>
> > >>> Hi Venkat,
> > >>>
> > >>> There are typically a small number of outstanding checkpoint files
> per
> > >>> operator, as newer checkpoints are created old ones are automatically
> > >>> deleted by the application when it determines that state is no longer
> > >>> needed. When an application stops/killed the last checkpoints remain.
> > >>> There
> > >>> is also a benefit to that since a new application can be restarted to
> > >>> continue from those checkpoints instead of starting all the way from
> > the
> > >>> beginning and this is useful in some cases. But if you are always
> > >>> starting
> > >>> your application from scratch yes you can delete the checkpoints of
> > older
> > >>> applications that are no longer running.
> > >>>
> > >>> Thanks
> > >>>
> > >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
> > >>> VKottapalli@directv.com> wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> >         Now that this has been discussed, Will the checkpointed
> data
> > be
> > >>> > purged when we kill the application forcefully?  In our current
> > usage,
> > >>> we
> > >>> > forcefully kill the app after it processes a certain batch of
> data. I
> > >>> see
> > >>> > these small files are created under (user/datatorrent) directory
> and
> > >>> not
> > >>> > removed.
> > >>> >
> > >>> >         Another scenario, when some of the containers keep failing,
> > we
> > >>> > have observed this state where the data is continuously
> checkpointed
> > >>> into
> > >>> > small files. When we kill the app, the data will be there.
> > >>> >
> > >>> >         We have received concerns saying this is impacting namenode
> > >>> > performance since these small files are stored in HDFS. So we
> > manually
> > >>> > remove these checkpointed data at regular intervals.
> > >>> >
> > >>> > -Venkatesh
> > >>> >
> > >>> > -----Original Message-----
> > >>> > From: Amol Kekre [mailto:amol@datatorrent.com]
> > >>> > Sent: Monday, February 01, 2016 7:49 AM
> > >>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
> > >>> > Subject: Re: Possibility of saving checkpoints on other distributed
> > >>> > filesystems
> > >>> >
> > >>> > Aniruddha,
> > >>> > We have not heard this request from users yet. It may be because
> our
> > >>> > checkpointing has a purge, i.e. the small files are not left over.
> > >>> Small
> > >>> > file problem has been there in Hadoop and relates to storing small
> > >>> files in
> > >>> > Hadoop for a longer time (more likely forever).
> > >>> >
> > >>> > Thks,
> > >>> > Amol
> > >>> >
> > >>> >
> > >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> > >>> > aniruddha@datatorrent.com> wrote:
> > >>> >
> > >>> > > Hi Community,
> > >>> > >
> > >>> > > Or Let me say BigFoots, do you think this feature should be
> > >>> available?
> > >>> > >
> > >>> > > The reason to bring this up was discussed in the start of this
> > >>> thread as:
> > >>> > >
> > >>> > > This is with the intention to recover the applications faster and
> > do
> > >>> > > away
> > >>> > > > with HDFS's small files problem as described here:
> > >>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > >>> > > >
> > >>> > > >
> > >>> > >
> > >>>
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > >>> > > l-files-problem/
> > >>> > > >
> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > >>> > > > If we could save checkpoints in some other distributed file
> > system
> > >>> > > > (or even a HA NAS box) geared for small files, we could
> achieve -
> > >>> > > >
> > >>> > > >    - Better performance of NN & HDFS for the production usage
> > >>> (read:
> > >>> > > >    production data I/O & not temp files)
> > >>> > > >
> > >>> > > >
> > >>> > > >    - Faster application recovery in case of planned shutdown /
> > >>> > unplanned
> > >>> > > >    restarts
> > >>> > > >
> > >>> > > > If you feel the need of this feature, please cast your opinions
> > and
> > >>> > > > ideas
> > >>> > > so that it can be converted in a jira.
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > Thanks,
> > >>> > >
> > >>> > >
> > >>> > > Aniruddha
> > >>> > >
> > >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> > >>> > > <ga...@datatorrent.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Aniruddha,
> > >>> > > >
> > >>> > > > Currently we don't have any support for that.
> > >>> > > >
> > >>> > > > Thanks
> > >>> > > > Gaurav
> > >>> > > >
> > >>> > > > Thanks
> > >>> > > > -Gaurav
> > >>> > > >
> > >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> > >>> > > > <tu...@datatorrent.com>
> > >>> > > > wrote:
> > >>> > > >
> > >>> > > > > Default FSStorageAgent can be used as it can work with local
> > >>> > > filesystem,
> > >>> > > > > but I far as I know there is no support for specifying the
> > >>> > > > > directory through xml file. by default it use the application
> > >>> > directory on HDFS.
> > >>> > > > >
> > >>> > > > > Not sure If we could specify storage agent with its
> properties
> > >>> > > > > through
> > >>> > > > the
> > >>> > > > > configuration at dag level.
> > >>> > > > >
> > >>> > > > > - Tushar.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > >>> > > > > aniruddha@datatorrent.com> wrote:
> > >>> > > > >
> > >>> > > > > > Hi,
> > >>> > > > > >
> > >>> > > > > > Do we have any storage agent which I can use readily,
> > >>> > > > > > configurable
> > >>> > > > > through
> > >>> > > > > > dt-site.xml?
> > >>> > > > > >
> > >>> > > > > > I am looking for something which would save checkpoints in
> > >>> > > > > > mounted
> > >>> > > file
> > >>> > > > > > system [eg. HA-NAS] which is basically just another
> directory
> > >>> > > > > > for
> > >>> > > Apex.
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > Thanks,
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > Aniruddha
> > >>> > > > > >
> > >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > >>> > > > sandesh@datatorrent.com>
> > >>> > > > > > wrote:
> > >>> > > > > >
> > >>> > > > > > > It is already supported refer the following jira for more
> > >>> > > > information,
> > >>> > > > > > >
> > >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > >>> > > > > > > aniruddha@datatorrent.com> wrote:
> > >>> > > > > > >
> > >>> > > > > > > > Hi,
> > >>> > > > > > > >
> > >>> > > > > > > > Is it possible to save checkpoints in any other highly
> > >>> > > > > > > > available distributed file systems (which maybe mounted
> > >>> > > > > > > > directories across
> > >>> > > > the
> > >>> > > > > > > > cluster) other than HDFS?
> > >>> > > > > > > > If yes, is it configurable?
> > >>> > > > > > > >
> > >>> > > > > > > > AFAIK, there is no configurable option available to
> > achieve
> > >>> > that.
> > >>> > > > > > > > If that's the case, can we have that feature?
> > >>> > > > > > > >
> > >>> > > > > > > > This is with the intention to recover the applications
> > >>> > > > > > > > faster and
> > >>> > > > do
> > >>> > > > > > away
> > >>> > > > > > > > with HDFS's small files problem as described here:
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > >>> > > > > > > > m/
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>>
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > >>> > > l-files-problem/
> > >>> > > > > > > >
> > >>> > > >
> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > >>> > > > > > > >
> > >>> > > > > > > > If we could save checkpoints in some other distributed
> > file
> > >>> > > system
> > >>> > > > > (or
> > >>> > > > > > > even
> > >>> > > > > > > > a HA NAS box) geared for small files, we could achieve
> -
> > >>> > > > > > > >
> > >>> > > > > > > >    - Better performance of NN & HDFS for the production
> > >>> > > > > > > > usage
> > >>> > > > (read:
> > >>> > > > > > > >    production data I/O & not temp files)
> > >>> > > > > > > >    - Faster application recovery in case of planned
> > >>> shutdown
> > >>> > > > > > > > /
> > >>> > > > > > unplanned
> > >>> > > > > > > >    restarts
> > >>> > > > > > > >
> > >>> > > > > > > > Please, send your comments, suggestions or ideas.
> > >>> > > > > > > >
> > >>> > > > > > > > Thanks,
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Aniruddha
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Thomas Weise <th...@datatorrent.com>.

Exactly, this doesn't make sense. I filed an enhancement to have this in GW
a while ago.

On Tue, Feb 2, 2016 at 8:48 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Yogi,
>
> kill is not an orderly shutdown, who will clean the state?
>
> On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
> wrote:
>
> > I would prefer to have an additional argument during application launch
> on
> > dtcli.
> >
> > Say, --preserve-kill-state true .
> >
> > Basically, platform should be able to do the clean-up activity if the
> > application is invoked with certain flag.
> >
> > Test apps can set this flag to clear the data on kill. Production apps
> can
> > set this flag to keep the data on kill.
> >
> > Shutdown should always preserve the state. But, for kill /
> forced-shutdown
> > user might prefer to clear the state.
> >
> > ~ Yogi
> >
> > On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
> >
> >>
> >> Can we include a script in our github (util?) that simply deletes these
> >> files upon application being killed, given an app-id. The admin will
> need
> >> to run this script. Auto-deleting will be bad as a lot of users,
> including
> >> those in production today need to restart using those files. The
> >> knowledge/desire to restart post failure is outside the app and hence
> >> technically the script should be explicitly user invoked
> >>
> >> Thks,
> >> Amol
> >>
> >>
> >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pramod@datatorrent.com
> >
> >> wrote:
> >>
> >>> Hi Venkat,
> >>>
> >>> There are typically a small number of outstanding checkpoint files per
> >>> operator, as newer checkpoints are created old ones are automatically
> >>> deleted by the application when it determines that state is no longer
> >>> needed. When an application stops/killed the last checkpoints remain.
> >>> There
> >>> is also a benefit to that since a new application can be restarted to
> >>> continue from those checkpoints instead of starting all the way from
> the
> >>> beginning and this is useful in some cases. But if you are always
> >>> starting
> >>> your application from scratch yes you can delete the checkpoints of
> older
> >>> applications that are no longer running.
> >>>
> >>> Thanks
> >>>
> >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
> >>> VKottapalli@directv.com> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> >         Now that this has been discussed, Will the checkpointed data
> be
> >>> > purged when we kill the application forcefully?  In our current
> usage,
> >>> we
> >>> > forcefully kill the app after it processes a certain batch of data. I
> >>> see
> >>> > these small files are created under (user/datatorrent) directory and
> >>> not
> >>> > removed.
> >>> >
> >>> >         Another scenario, when some of the containers keep failing,
> we
> >>> > have observed this state where the data is continuously checkpointed
> >>> into
> >>> > small files. When we kill the app, the data will be there.
> >>> >
> >>> >         We have received concerns saying this is impacting namenode
> >>> > performance since these small files are stored in HDFS. So we
> manually
> >>> > remove these checkpointed data at regular intervals.
> >>> >
> >>> > -Venkatesh
> >>> >
> >>> > -----Original Message-----
> >>> > From: Amol Kekre [mailto:amol@datatorrent.com]
> >>> > Sent: Monday, February 01, 2016 7:49 AM
> >>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
> >>> > Subject: Re: Possibility of saving checkpoints on other distributed
> >>> > filesystems
> >>> >
> >>> > Aniruddha,
> >>> > We have not heard this request from users yet. It may be because our
> >>> > checkpointing has a purge, i.e. the small files are not left over.
> >>> Small
> >>> > file problem has been there in Hadoop and relates to storing small
> >>> files in
> >>> > Hadoop for a longer time (more likely forever).
> >>> >
> >>> > Thks,
> >>> > Amol
> >>> >
> >>> >
> >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> >>> > aniruddha@datatorrent.com> wrote:
> >>> >
> >>> > > Hi Community,
> >>> > >
> >>> > > Or Let me say BigFoots, do you think this feature should be
> >>> available?
> >>> > >
> >>> > > The reason to bring this up was discussed in the start of this
> >>> thread as:
> >>> > >
> >>> > > This is with the intention to recover the applications faster and
> do
> >>> > > away
> >>> > > > with HDFS's small files problem as described here:
> >>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> >>> > > >
> >>> > > >
> >>> > >
> >>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> >>> > > l-files-problem/
> >>> > > >
> >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> >>> > > > If we could save checkpoints in some other distributed file
> system
> >>> > > > (or even a HA NAS box) geared for small files, we could achieve -
> >>> > > >
> >>> > > >    - Better performance of NN & HDFS for the production usage
> >>> (read:
> >>> > > >    production data I/O & not temp files)
> >>> > > >
> >>> > > >
> >>> > > >    - Faster application recovery in case of planned shutdown /
> >>> > unplanned
> >>> > > >    restarts
> >>> > > >
> >>> > > > If you feel the need of this feature, please cast your opinions
> and
> >>> > > > ideas
> >>> > > so that it can be converted in a jira.
> >>> > >
> >>> > >
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > >
> >>> > > Aniruddha
> >>> > >
> >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> >>> > > <ga...@datatorrent.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Aniruddha,
> >>> > > >
> >>> > > > Currently we don't have any support for that.
> >>> > > >
> >>> > > > Thanks
> >>> > > > Gaurav
> >>> > > >
> >>> > > > Thanks
> >>> > > > -Gaurav
> >>> > > >
> >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> >>> > > > <tu...@datatorrent.com>
> >>> > > > wrote:
> >>> > > >
> >>> > > > > Default FSStorageAgent can be used as it can work with local
> >>> > > filesystem,
> >>> > > > > but I far as I know there is no support for specifying the
> >>> > > > > directory through xml file. by default it use the application
> >>> > directory on HDFS.
> >>> > > > >
> >>> > > > > Not sure If we could specify storage agent with its properties
> >>> > > > > through
> >>> > > > the
> >>> > > > > configuration at dag level.
> >>> > > > >
> >>> > > > > - Tushar.
> >>> > > > >
> >>> > > > >
> >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> >>> > > > > aniruddha@datatorrent.com> wrote:
> >>> > > > >
> >>> > > > > > Hi,
> >>> > > > > >
> >>> > > > > > Do we have any storage agent which I can use readily,
> >>> > > > > > configurable
> >>> > > > > through
> >>> > > > > > dt-site.xml?
> >>> > > > > >
> >>> > > > > > I am looking for something which would save checkpoints in
> >>> > > > > > mounted
> >>> > > file
> >>> > > > > > system [eg. HA-NAS] which is basically just another directory
> >>> > > > > > for
> >>> > > Apex.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > Thanks,
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > Aniruddha
> >>> > > > > >
> >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> >>> > > > sandesh@datatorrent.com>
> >>> > > > > > wrote:
> >>> > > > > >
> >>> > > > > > > It is already supported refer the following jira for more
> >>> > > > information,
> >>> > > > > > >
> >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> >>> > > > > > >
> >>> > > > > > >
> >>> > > > > > >
> >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> >>> > > > > > > aniruddha@datatorrent.com> wrote:
> >>> > > > > > >
> >>> > > > > > > > Hi,
> >>> > > > > > > >
> >>> > > > > > > > Is it possible to save checkpoints in any other highly
> >>> > > > > > > > available distributed file systems (which maybe mounted
> >>> > > > > > > > directories across
> >>> > > > the
> >>> > > > > > > > cluster) other than HDFS?
> >>> > > > > > > > If yes, is it configurable?
> >>> > > > > > > >
> >>> > > > > > > > AFAIK, there is no configurable option available to
> achieve
> >>> > that.
> >>> > > > > > > > If that's the case, can we have that feature?
> >>> > > > > > > >
> >>> > > > > > > > This is with the intention to recover the applications
> >>> > > > > > > > faster and
> >>> > > > do
> >>> > > > > > away
> >>> > > > > > > > with HDFS's small files problem as described here:
> >>> > > > > > > >
> >>> > > > > > > >
> >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> >>> > > > > > > > m/
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> >>> > > l-files-problem/
> >>> > > > > > > >
> >>> > > >
> >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> >>> > > > > > > >
> >>> > > > > > > > If we could save checkpoints in some other distributed
> file
> >>> > > system
> >>> > > > > (or
> >>> > > > > > > even
> >>> > > > > > > > a HA NAS box) geared for small files, we could achieve -
> >>> > > > > > > >
> >>> > > > > > > >    - Better performance of NN & HDFS for the production
> >>> > > > > > > > usage
> >>> > > > (read:
> >>> > > > > > > >    production data I/O & not temp files)
> >>> > > > > > > >    - Faster application recovery in case of planned
> >>> shutdown
> >>> > > > > > > > /
> >>> > > > > > unplanned
> >>> > > > > > > >    restarts
> >>> > > > > > > >
> >>> > > > > > > > Please, send your comments, suggestions or ideas.
> >>> > > > > > > >
> >>> > > > > > > > Thanks,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > Aniruddha
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Yogi,

kill is not an orderly shutdown, who will clean the state?

On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
wrote:

> I would prefer to have an additional argument during application launch on
> dtcli.
>
> Say, --preserve-kill-state true .
>
> Basically, platform should be able to do the clean-up activity if the
> application is invoked with certain flag.
>
> Test apps can set this flag to clear the data on kill. Production apps can
> set this flag to keep the data on kill.
>
> Shutdown should always preserve the state. But, for kill / forced-shutdown
> user might prefer to clear the state.
>
> ~ Yogi
>
> On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
>
>>
>> Can we include a script in our github (util?) that simply deletes these
>> files upon application being killed, given an app-id. The admin will need
>> to run this script. Auto-deleting will be bad as a lot of users, including
>> those in production today need to restart using those files. The
>> knowledge/desire to restart post failure is outside the app and hence
>> technically the script should be explicitly user invoked
>>
>> Thks,
>> Amol
>>
>>
>> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>
>>> Hi Venkat,
>>>
>>> There are typically a small number of outstanding checkpoint files per
>>> operator, as newer checkpoints are created old ones are automatically
>>> deleted by the application when it determines that state is no longer
>>> needed. When an application stops/killed the last checkpoints remain.
>>> There
>>> is also a benefit to that since a new application can be restarted to
>>> continue from those checkpoints instead of starting all the way from the
>>> beginning and this is useful in some cases. But if you are always
>>> starting
>>> your application from scratch yes you can delete the checkpoints of older
>>> applications that are no longer running.
>>>
>>> Thanks
>>>
>>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>>> VKottapalli@directv.com> wrote:
>>>
>>> > Hi,
>>> >
>>> >         Now that this has been discussed, Will the checkpointed data be
>>> > purged when we kill the application forcefully?  In our current usage,
>>> we
>>> > forcefully kill the app after it processes a certain batch of data. I
>>> see
>>> > these small files are created under (user/datatorrent) directory and
>>> not
>>> > removed.
>>> >
>>> >         Another scenario, when some of the containers keep failing, we
>>> > have observed this state where the data is continuously checkpointed
>>> into
>>> > small files. When we kill the app, the data will be there.
>>> >
>>> >         We have received concerns saying this is impacting namenode
>>> > performance since these small files are stored in HDFS. So we manually
>>> > remove these checkpointed data at regular intervals.
>>> >
>>> > -Venkatesh
>>> >
>>> > -----Original Message-----
>>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>>> > Sent: Monday, February 01, 2016 7:49 AM
>>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
>>> > Subject: Re: Possibility of saving checkpoints on other distributed
>>> > filesystems
>>> >
>>> > Aniruddha,
>>> > We have not heard this request from users yet. It may be because our
>>> > checkpointing has a purge, i.e. the small files are not left over.
>>> Small
>>> > file problem has been there in Hadoop and relates to storing small
>>> files in
>>> > Hadoop for a longer time (more likely forever).
>>> >
>>> > Thks,
>>> > Amol
>>> >
>>> >
>>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>>> > aniruddha@datatorrent.com> wrote:
>>> >
>>> > > Hi Community,
>>> > >
>>> > > Or Let me say BigFoots, do you think this feature should be
>>> available?
>>> > >
>>> > > The reason to bring this up was discussed in the start of this
>>> thread as:
>>> > >
>>> > > This is with the intention to recover the applications faster and do
>>> > > away
>>> > > > with HDFS's small files problem as described here:
>>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>>> > > >
>>> > > >
>>> > >
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > > l-files-problem/
>>> > > >
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > > > If we could save checkpoints in some other distributed file system
>>> > > > (or even a HA NAS box) geared for small files, we could achieve -
>>> > > >
>>> > > >    - Better performance of NN & HDFS for the production usage
>>> (read:
>>> > > >    production data I/O & not temp files)
>>> > > >
>>> > > >
>>> > > >    - Faster application recovery in case of planned shutdown /
>>> > unplanned
>>> > > >    restarts
>>> > > >
>>> > > > If you feel the need of this feature, please cast your opinions and
>>> > > > ideas
>>> > > so that it can be converted in a jira.
>>> > >
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > >
>>> > > Aniruddha
>>> > >
>>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>>> > > <ga...@datatorrent.com>
>>> > > wrote:
>>> > >
>>> > > > Aniruddha,
>>> > > >
>>> > > > Currently we don't have any support for that.
>>> > > >
>>> > > > Thanks
>>> > > > Gaurav
>>> > > >
>>> > > > Thanks
>>> > > > -Gaurav
>>> > > >
>>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>>> > > > <tu...@datatorrent.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Default FSStorageAgent can be used as it can work with local
>>> > > filesystem,
>>> > > > > but I far as I know there is no support for specifying the
>>> > > > > directory through xml file. by default it use the application
>>> > directory on HDFS.
>>> > > > >
>>> > > > > Not sure If we could specify storage agent with its properties
>>> > > > > through
>>> > > > the
>>> > > > > configuration at dag level.
>>> > > > >
>>> > > > > - Tushar.
>>> > > > >
>>> > > > >
>>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>>> > > > > aniruddha@datatorrent.com> wrote:
>>> > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > Do we have any storage agent which I can use readily,
>>> > > > > > configurable
>>> > > > > through
>>> > > > > > dt-site.xml?
>>> > > > > >
>>> > > > > > I am looking for something which would save checkpoints in
>>> > > > > > mounted
>>> > > file
>>> > > > > > system [eg. HA-NAS] which is basically just another directory
>>> > > > > > for
>>> > > Apex.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > >
>>> > > > > > Aniruddha
>>> > > > > >
>>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>>> > > > sandesh@datatorrent.com>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > It is already supported refer the following jira for more
>>> > > > information,
>>> > > > > > >
>>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>>> > > > > > > aniruddha@datatorrent.com> wrote:
>>> > > > > > >
>>> > > > > > > > Hi,
>>> > > > > > > >
>>> > > > > > > > Is it possible to save checkpoints in any other highly
>>> > > > > > > > available distributed file systems (which maybe mounted
>>> > > > > > > > directories across
>>> > > > the
>>> > > > > > > > cluster) other than HDFS?
>>> > > > > > > > If yes, is it configurable?
>>> > > > > > > >
>>> > > > > > > > AFAIK, there is no configurable option available to achieve
>>> > that.
>>> > > > > > > > If that's the case, can we have that feature?
>>> > > > > > > >
>>> > > > > > > > This is with the intention to recover the applications
>>> > > > > > > > faster and
>>> > > > do
>>> > > > > > away
>>> > > > > > > > with HDFS's small files problem as described here:
>>> > > > > > > >
>>> > > > > > > >
>>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>>> > > > > > > > m/
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > > l-files-problem/
>>> > > > > > > >
>>> > > >
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > > > > > > >
>>> > > > > > > > If we could save checkpoints in some other distributed file
>>> > > system
>>> > > > > (or
>>> > > > > > > even
>>> > > > > > > > a HA NAS box) geared for small files, we could achieve -
>>> > > > > > > >
>>> > > > > > > >    - Better performance of NN & HDFS for the production
>>> > > > > > > > usage
>>> > > > (read:
>>> > > > > > > >    production data I/O & not temp files)
>>> > > > > > > >    - Faster application recovery in case of planned
>>> shutdown
>>> > > > > > > > /
>>> > > > > > unplanned
>>> > > > > > > >    restarts
>>> > > > > > > >
>>> > > > > > > > Please, send your comments, suggestions or ideas.
>>> > > > > > > >
>>> > > > > > > > Thanks,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > Aniruddha
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Amol Kekre <am...@datatorrent.com>.

This feature is ok as long as flag is false (do not remove files) by
default. One issue may be that RM may force kill the AM and not give it a
chance, so this feature may work only on graceful shutdown.

Thks
Amol


On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
wrote:

> I would prefer to have an additional argument during application launch on
> dtcli.
>
> Say, --preserve-kill-state true .
>
> Basically, platform should be able to do the clean-up activity if the
> application is invoked with certain flag.
>
> Test apps can set this flag to clear the data on kill. Production apps can
> set this flag to keep the data on kill.
>
> Shutdown should always preserve the state. But, for kill / forced-shutdown
> user might prefer to clear the state.
>
> ~ Yogi
>
> On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
>
>>
>> Can we include a script in our github (util?) that simply deletes these
>> files upon application being killed, given an app-id. The admin will need
>> to run this script. Auto-deleting will be bad as a lot of users, including
>> those in production today need to restart using those files. The
>> knowledge/desire to restart post failure is outside the app and hence
>> technically the script should be explicitly user invoked
>>
>> Thks,
>> Amol
>>
>>
>> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>
>>> Hi Venkat,
>>>
>>> There are typically a small number of outstanding checkpoint files per
>>> operator, as newer checkpoints are created old ones are automatically
>>> deleted by the application when it determines that state is no longer
>>> needed. When an application stops/killed the last checkpoints remain.
>>> There
>>> is also a benefit to that since a new application can be restarted to
>>> continue from those checkpoints instead of starting all the way from the
>>> beginning and this is useful in some cases. But if you are always
>>> starting
>>> your application from scratch yes you can delete the checkpoints of older
>>> applications that are no longer running.
>>>
>>> Thanks
>>>
>>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>>> VKottapalli@directv.com> wrote:
>>>
>>> > Hi,
>>> >
>>> >         Now that this has been discussed, Will the checkpointed data be
>>> > purged when we kill the application forcefully?  In our current usage,
>>> we
>>> > forcefully kill the app after it processes a certain batch of data. I
>>> see
>>> > these small files are created under (user/datatorrent) directory and
>>> not
>>> > removed.
>>> >
>>> >         Another scenario, when some of the containers keep failing, we
>>> > have observed this state where the data is continuously checkpointed
>>> into
>>> > small files. When we kill the app, the data will be there.
>>> >
>>> >         We have received concerns saying this is impacting namenode
>>> > performance since these small files are stored in HDFS. So we manually
>>> > remove these checkpointed data at regular intervals.
>>> >
>>> > -Venkatesh
>>> >
>>> > -----Original Message-----
>>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>>> > Sent: Monday, February 01, 2016 7:49 AM
>>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
>>> > Subject: Re: Possibility of saving checkpoints on other distributed
>>> > filesystems
>>> >
>>> > Aniruddha,
>>> > We have not heard this request from users yet. It may be because our
>>> > checkpointing has a purge, i.e. the small files are not left over.
>>> Small
>>> > file problem has been there in Hadoop and relates to storing small
>>> files in
>>> > Hadoop for a longer time (more likely forever).
>>> >
>>> > Thks,
>>> > Amol
>>> >
>>> >
>>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>>> > aniruddha@datatorrent.com> wrote:
>>> >
>>> > > Hi Community,
>>> > >
>>> > > Or Let me say BigFoots, do you think this feature should be
>>> available?
>>> > >
>>> > > The reason to bring this up was discussed in the start of this
>>> thread as:
>>> > >
>>> > > This is with the intention to recover the applications faster and do
>>> > > away
>>> > > > with HDFS's small files problem as described here:
>>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>>> > > >
>>> > > >
>>> > >
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > > l-files-problem/
>>> > > >
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > > > If we could save checkpoints in some other distributed file system
>>> > > > (or even a HA NAS box) geared for small files, we could achieve -
>>> > > >
>>> > > >    - Better performance of NN & HDFS for the production usage
>>> (read:
>>> > > >    production data I/O & not temp files)
>>> > > >
>>> > > >
>>> > > >    - Faster application recovery in case of planned shutdown /
>>> > unplanned
>>> > > >    restarts
>>> > > >
>>> > > > If you feel the need of this feature, please cast your opinions and
>>> > > > ideas
>>> > > so that it can be converted in a jira.
>>> > >
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > >
>>> > > Aniruddha
>>> > >
>>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>>> > > <ga...@datatorrent.com>
>>> > > wrote:
>>> > >
>>> > > > Aniruddha,
>>> > > >
>>> > > > Currently we don't have any support for that.
>>> > > >
>>> > > > Thanks
>>> > > > Gaurav
>>> > > >
>>> > > > Thanks
>>> > > > -Gaurav
>>> > > >
>>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>>> > > > <tu...@datatorrent.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Default FSStorageAgent can be used as it can work with local
>>> > > filesystem,
>>> > > > > but I far as I know there is no support for specifying the
>>> > > > > directory through xml file. by default it use the application
>>> > directory on HDFS.
>>> > > > >
>>> > > > > Not sure If we could specify storage agent with its properties
>>> > > > > through
>>> > > > the
>>> > > > > configuration at dag level.
>>> > > > >
>>> > > > > - Tushar.
>>> > > > >
>>> > > > >
>>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>>> > > > > aniruddha@datatorrent.com> wrote:
>>> > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > Do we have any storage agent which I can use readily,
>>> > > > > > configurable
>>> > > > > through
>>> > > > > > dt-site.xml?
>>> > > > > >
>>> > > > > > I am looking for something which would save checkpoints in
>>> > > > > > mounted
>>> > > file
>>> > > > > > system [eg. HA-NAS] which is basically just another directory
>>> > > > > > for
>>> > > Apex.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > >
>>> > > > > > Aniruddha
>>> > > > > >
>>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>>> > > > sandesh@datatorrent.com>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > It is already supported refer the following jira for more
>>> > > > information,
>>> > > > > > >
>>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>>> > > > > > > aniruddha@datatorrent.com> wrote:
>>> > > > > > >
>>> > > > > > > > Hi,
>>> > > > > > > >
>>> > > > > > > > Is it possible to save checkpoints in any other highly
>>> > > > > > > > available distributed file systems (which maybe mounted
>>> > > > > > > > directories across
>>> > > > the
>>> > > > > > > > cluster) other than HDFS?
>>> > > > > > > > If yes, is it configurable?
>>> > > > > > > >
>>> > > > > > > > AFAIK, there is no configurable option available to achieve
>>> > that.
>>> > > > > > > > If that's the case, can we have that feature?
>>> > > > > > > >
>>> > > > > > > > This is with the intention to recover the applications
>>> > > > > > > > faster and
>>> > > > do
>>> > > > > > away
>>> > > > > > > > with HDFS's small files problem as described here:
>>> > > > > > > >
>>> > > > > > > >
>>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>>> > > > > > > > m/
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > > l-files-problem/
>>> > > > > > > >
>>> > > >
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > > > > > > >
>>> > > > > > > > If we could save checkpoints in some other distributed file
>>> > > system
>>> > > > > (or
>>> > > > > > > even
>>> > > > > > > > a HA NAS box) geared for small files, we could achieve -
>>> > > > > > > >
>>> > > > > > > >    - Better performance of NN & HDFS for the production
>>> > > > > > > > usage
>>> > > > (read:
>>> > > > > > > >    production data I/O & not temp files)
>>> > > > > > > >    - Faster application recovery in case of planned
>>> shutdown
>>> > > > > > > > /
>>> > > > > > unplanned
>>> > > > > > > >    restarts
>>> > > > > > > >
>>> > > > > > > > Please, send your comments, suggestions or ideas.
>>> > > > > > > >
>>> > > > > > > > Thanks,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > Aniruddha
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Yogi,

kill is not an orderly shutdown, who will clean the state?

On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <yo...@apache.org>
wrote:

> I would prefer to have an additional argument during application launch on
> dtcli.
>
> Say, --preserve-kill-state true .
>
> Basically, platform should be able to do the clean-up activity if the
> application is invoked with certain flag.
>
> Test apps can set this flag to clear the data on kill. Production apps can
> set this flag to keep the data on kill.
>
> Shutdown should always preserve the state. But, for kill / forced-shutdown
> user might prefer to clear the state.
>
> ~ Yogi
>
> On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:
>
>>
>> Can we include a script in our github (util?) that simply deletes these
>> files upon application being killed, given an app-id. The admin will need
>> to run this script. Auto-deleting will be bad as a lot of users, including
>> those in production today need to restart using those files. The
>> knowledge/desire to restart post failure is outside the app and hence
>> technically the script should be explicitly user invoked
>>
>> Thks,
>> Amol
>>
>>
>> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pr...@datatorrent.com>
>> wrote:
>>
>>> Hi Venkat,
>>>
>>> There are typically a small number of outstanding checkpoint files per
>>> operator, as newer checkpoints are created old ones are automatically
>>> deleted by the application when it determines that state is no longer
>>> needed. When an application stops/killed the last checkpoints remain.
>>> There
>>> is also a benefit to that since a new application can be restarted to
>>> continue from those checkpoints instead of starting all the way from the
>>> beginning and this is useful in some cases. But if you are always
>>> starting
>>> your application from scratch yes you can delete the checkpoints of older
>>> applications that are no longer running.
>>>
>>> Thanks
>>>
>>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>>> VKottapalli@directv.com> wrote:
>>>
>>> > Hi,
>>> >
>>> >         Now that this has been discussed, Will the checkpointed data be
>>> > purged when we kill the application forcefully?  In our current usage,
>>> we
>>> > forcefully kill the app after it processes a certain batch of data. I
>>> see
>>> > these small files are created under (user/datatorrent) directory and
>>> not
>>> > removed.
>>> >
>>> >         Another scenario, when some of the containers keep failing, we
>>> > have observed this state where the data is continuously checkpointed
>>> into
>>> > small files. When we kill the app, the data will be there.
>>> >
>>> >         We have received concerns saying this is impacting namenode
>>> > performance since these small files are stored in HDFS. So we manually
>>> > remove these checkpointed data at regular intervals.
>>> >
>>> > -Venkatesh
>>> >
>>> > -----Original Message-----
>>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>>> > Sent: Monday, February 01, 2016 7:49 AM
>>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
>>> > Subject: Re: Possibility of saving checkpoints on other distributed
>>> > filesystems
>>> >
>>> > Aniruddha,
>>> > We have not heard this request from users yet. It may be because our
>>> > checkpointing has a purge, i.e. the small files are not left over.
>>> Small
>>> > file problem has been there in Hadoop and relates to storing small
>>> files in
>>> > Hadoop for a longer time (more likely forever).
>>> >
>>> > Thks,
>>> > Amol
>>> >
>>> >
>>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>>> > aniruddha@datatorrent.com> wrote:
>>> >
>>> > > Hi Community,
>>> > >
>>> > > Or Let me say BigFoots, do you think this feature should be
>>> available?
>>> > >
>>> > > The reason to bring this up was discussed in the start of this
>>> thread as:
>>> > >
>>> > > This is with the intention to recover the applications faster and do
>>> > > away
>>> > > > with HDFS's small files problem as described here:
>>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>>> > > >
>>> > > >
>>> > >
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > > l-files-problem/
>>> > > >
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > > > If we could save checkpoints in some other distributed file system
>>> > > > (or even a HA NAS box) geared for small files, we could achieve -
>>> > > >
>>> > > >    - Better performance of NN & HDFS for the production usage
>>> (read:
>>> > > >    production data I/O & not temp files)
>>> > > >
>>> > > >
>>> > > >    - Faster application recovery in case of planned shutdown /
>>> > unplanned
>>> > > >    restarts
>>> > > >
>>> > > > If you feel the need of this feature, please cast your opinions and
>>> > > > ideas
>>> > > so that it can be converted in a jira.
>>> > >
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > >
>>> > > Aniruddha
>>> > >
>>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>>> > > <ga...@datatorrent.com>
>>> > > wrote:
>>> > >
>>> > > > Aniruddha,
>>> > > >
>>> > > > Currently we don't have any support for that.
>>> > > >
>>> > > > Thanks
>>> > > > Gaurav
>>> > > >
>>> > > > Thanks
>>> > > > -Gaurav
>>> > > >
>>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>>> > > > <tu...@datatorrent.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Default FSStorageAgent can be used as it can work with local
>>> > > filesystem,
>>> > > > > but I far as I know there is no support for specifying the
>>> > > > > directory through xml file. by default it use the application
>>> > directory on HDFS.
>>> > > > >
>>> > > > > Not sure If we could specify storage agent with its properties
>>> > > > > through
>>> > > > the
>>> > > > > configuration at dag level.
>>> > > > >
>>> > > > > - Tushar.
>>> > > > >
>>> > > > >
>>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>>> > > > > aniruddha@datatorrent.com> wrote:
>>> > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > Do we have any storage agent which I can use readily,
>>> > > > > > configurable
>>> > > > > through
>>> > > > > > dt-site.xml?
>>> > > > > >
>>> > > > > > I am looking for something which would save checkpoints in
>>> > > > > > mounted
>>> > > file
>>> > > > > > system [eg. HA-NAS] which is basically just another directory
>>> > > > > > for
>>> > > Apex.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > >
>>> > > > > > Aniruddha
>>> > > > > >
>>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>>> > > > sandesh@datatorrent.com>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > It is already supported refer the following jira for more
>>> > > > information,
>>> > > > > > >
>>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>>> > > > > > > aniruddha@datatorrent.com> wrote:
>>> > > > > > >
>>> > > > > > > > Hi,
>>> > > > > > > >
>>> > > > > > > > Is it possible to save checkpoints in any other highly
>>> > > > > > > > available distributed file systems (which maybe mounted
>>> > > > > > > > directories across
>>> > > > the
>>> > > > > > > > cluster) other than HDFS?
>>> > > > > > > > If yes, is it configurable?
>>> > > > > > > >
>>> > > > > > > > AFAIK, there is no configurable option available to achieve
>>> > that.
>>> > > > > > > > If that's the case, can we have that feature?
>>> > > > > > > >
>>> > > > > > > > This is with the intention to recover the applications
>>> > > > > > > > faster and
>>> > > > do
>>> > > > > > away
>>> > > > > > > > with HDFS's small files problem as described here:
>>> > > > > > > >
>>> > > > > > > >
>>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>>> > > > > > > > m/
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>>> > > l-files-problem/
>>> > > > > > > >
>>> > > >
>>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>>> > > > > > > >
>>> > > > > > > > If we could save checkpoints in some other distributed file
>>> > > system
>>> > > > > (or
>>> > > > > > > even
>>> > > > > > > > a HA NAS box) geared for small files, we could achieve -
>>> > > > > > > >
>>> > > > > > > >    - Better performance of NN & HDFS for the production
>>> > > > > > > > usage
>>> > > > (read:
>>> > > > > > > >    production data I/O & not temp files)
>>> > > > > > > >    - Faster application recovery in case of planned
>>> shutdown
>>> > > > > > > > /
>>> > > > > > unplanned
>>> > > > > > > >    restarts
>>> > > > > > > >
>>> > > > > > > > Please, send your comments, suggestions or ideas.
>>> > > > > > > >
>>> > > > > > > > Thanks,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > Aniruddha
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Yogi Devendra <yo...@apache.org>.

I would prefer to have an additional argument during application launch on
dtcli.

Say, --preserve-kill-state true .

Basically, platform should be able to do the clean-up activity if the
application is invoked with certain flag.

Test apps can set this flag to clear the data on kill. Production apps can
set this flag to keep the data on kill.

Shutdown should always preserve the state. But, for kill / forced-shutdown
user might prefer to clear the state.

~ Yogi

On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:

>
> Can we include a script in our github (util?) that simply deletes these
> files upon application being killed, given an app-id. The admin will need
> to run this script. Auto-deleting will be bad as a lot of users, including
> those in production today need to restart using those files. The
> knowledge/desire to restart post failure is outside the app and hence
> technically the script should be explicitly user invoked
>
> Thks,
> Amol
>
>
> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
>> Hi Venkat,
>>
>> There are typically a small number of outstanding checkpoint files per
>> operator, as newer checkpoints are created old ones are automatically
>> deleted by the application when it determines that state is no longer
>> needed. When an application stops/killed the last checkpoints remain.
>> There
>> is also a benefit to that since a new application can be restarted to
>> continue from those checkpoints instead of starting all the way from the
>> beginning and this is useful in some cases. But if you are always starting
>> your application from scratch yes you can delete the checkpoints of older
>> applications that are no longer running.
>>
>> Thanks
>>
>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>> VKottapalli@directv.com> wrote:
>>
>> > Hi,
>> >
>> >         Now that this has been discussed, Will the checkpointed data be
>> > purged when we kill the application forcefully?  In our current usage,
>> we
>> > forcefully kill the app after it processes a certain batch of data. I
>> see
>> > these small files are created under (user/datatorrent) directory and not
>> > removed.
>> >
>> >         Another scenario, when some of the containers keep failing, we
>> > have observed this state where the data is continuously checkpointed
>> into
>> > small files. When we kill the app, the data will be there.
>> >
>> >         We have received concerns saying this is impacting namenode
>> > performance since these small files are stored in HDFS. So we manually
>> > remove these checkpointed data at regular intervals.
>> >
>> > -Venkatesh
>> >
>> > -----Original Message-----
>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>> > Sent: Monday, February 01, 2016 7:49 AM
>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
>> > Subject: Re: Possibility of saving checkpoints on other distributed
>> > filesystems
>> >
>> > Aniruddha,
>> > We have not heard this request from users yet. It may be because our
>> > checkpointing has a purge, i.e. the small files are not left over. Small
>> > file problem has been there in Hadoop and relates to storing small
>> files in
>> > Hadoop for a longer time (more likely forever).
>> >
>> > Thks,
>> > Amol
>> >
>> >
>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>> > aniruddha@datatorrent.com> wrote:
>> >
>> > > Hi Community,
>> > >
>> > > Or Let me say BigFoots, do you think this feature should be available?
>> > >
>> > > The reason to bring this up was discussed in the start of this thread
>> as:
>> > >
>> > > This is with the intention to recover the applications faster and do
>> > > away
>> > > > with HDFS's small files problem as described here:
>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>> > > >
>> > > >
>> > >
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > > l-files-problem/
>> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > > > If we could save checkpoints in some other distributed file system
>> > > > (or even a HA NAS box) geared for small files, we could achieve -
>> > > >
>> > > >    - Better performance of NN & HDFS for the production usage (read:
>> > > >    production data I/O & not temp files)
>> > > >
>> > > >
>> > > >    - Faster application recovery in case of planned shutdown /
>> > unplanned
>> > > >    restarts
>> > > >
>> > > > If you feel the need of this feature, please cast your opinions and
>> > > > ideas
>> > > so that it can be converted in a jira.
>> > >
>> > >
>> > >
>> > > Thanks,
>> > >
>> > >
>> > > Aniruddha
>> > >
>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>> > > <ga...@datatorrent.com>
>> > > wrote:
>> > >
>> > > > Aniruddha,
>> > > >
>> > > > Currently we don't have any support for that.
>> > > >
>> > > > Thanks
>> > > > Gaurav
>> > > >
>> > > > Thanks
>> > > > -Gaurav
>> > > >
>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>> > > > <tu...@datatorrent.com>
>> > > > wrote:
>> > > >
>> > > > > Default FSStorageAgent can be used as it can work with local
>> > > filesystem,
>> > > > > but I far as I know there is no support for specifying the
>> > > > > directory through xml file. by default it use the application
>> > directory on HDFS.
>> > > > >
>> > > > > Not sure If we could specify storage agent with its properties
>> > > > > through
>> > > > the
>> > > > > configuration at dag level.
>> > > > >
>> > > > > - Tushar.
>> > > > >
>> > > > >
>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>> > > > > aniruddha@datatorrent.com> wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > Do we have any storage agent which I can use readily,
>> > > > > > configurable
>> > > > > through
>> > > > > > dt-site.xml?
>> > > > > >
>> > > > > > I am looking for something which would save checkpoints in
>> > > > > > mounted
>> > > file
>> > > > > > system [eg. HA-NAS] which is basically just another directory
>> > > > > > for
>> > > Apex.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > >
>> > > > > > Aniruddha
>> > > > > >
>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>> > > > sandesh@datatorrent.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > It is already supported refer the following jira for more
>> > > > information,
>> > > > > > >
>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>> > > > > > > aniruddha@datatorrent.com> wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > Is it possible to save checkpoints in any other highly
>> > > > > > > > available distributed file systems (which maybe mounted
>> > > > > > > > directories across
>> > > > the
>> > > > > > > > cluster) other than HDFS?
>> > > > > > > > If yes, is it configurable?
>> > > > > > > >
>> > > > > > > > AFAIK, there is no configurable option available to achieve
>> > that.
>> > > > > > > > If that's the case, can we have that feature?
>> > > > > > > >
>> > > > > > > > This is with the intention to recover the applications
>> > > > > > > > faster and
>> > > > do
>> > > > > > away
>> > > > > > > > with HDFS's small files problem as described here:
>> > > > > > > >
>> > > > > > > >
>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>> > > > > > > > m/
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > > l-files-problem/
>> > > > > > > >
>> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > > > > > > >
>> > > > > > > > If we could save checkpoints in some other distributed file
>> > > system
>> > > > > (or
>> > > > > > > even
>> > > > > > > > a HA NAS box) geared for small files, we could achieve -
>> > > > > > > >
>> > > > > > > >    - Better performance of NN & HDFS for the production
>> > > > > > > > usage
>> > > > (read:
>> > > > > > > >    production data I/O & not temp files)
>> > > > > > > >    - Faster application recovery in case of planned shutdown
>> > > > > > > > /
>> > > > > > unplanned
>> > > > > > > >    restarts
>> > > > > > > >
>> > > > > > > > Please, send your comments, suggestions or ideas.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Aniruddha
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Yogi Devendra <yo...@apache.org>.

I would prefer to have an additional argument during application launch on
dtcli.

Say, --preserve-kill-state true .

Basically, platform should be able to do the clean-up activity if the
application is invoked with certain flag.

Test apps can set this flag to clear the data on kill. Production apps can
set this flag to keep the data on kill.

Shutdown should always preserve the state. But, for kill / forced-shutdown
user might prefer to clear the state.

~ Yogi

On 2 February 2016 at 21:53, Amol Kekre <am...@datatorrent.com> wrote:

>
> Can we include a script in our github (util?) that simply deletes these
> files upon application being killed, given an app-id. The admin will need
> to run this script. Auto-deleting will be bad as a lot of users, including
> those in production today need to restart using those files. The
> knowledge/desire to restart post failure is outside the app and hence
> technically the script should be explicitly user invoked
>
> Thks,
> Amol
>
>
> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
>> Hi Venkat,
>>
>> There are typically a small number of outstanding checkpoint files per
>> operator, as newer checkpoints are created old ones are automatically
>> deleted by the application when it determines that state is no longer
>> needed. When an application stops/killed the last checkpoints remain.
>> There
>> is also a benefit to that since a new application can be restarted to
>> continue from those checkpoints instead of starting all the way from the
>> beginning and this is useful in some cases. But if you are always starting
>> your application from scratch yes you can delete the checkpoints of older
>> applications that are no longer running.
>>
>> Thanks
>>
>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>> VKottapalli@directv.com> wrote:
>>
>> > Hi,
>> >
>> >         Now that this has been discussed, Will the checkpointed data be
>> > purged when we kill the application forcefully?  In our current usage,
>> we
>> > forcefully kill the app after it processes a certain batch of data. I
>> see
>> > these small files are created under (user/datatorrent) directory and not
>> > removed.
>> >
>> >         Another scenario, when some of the containers keep failing, we
>> > have observed this state where the data is continuously checkpointed
>> into
>> > small files. When we kill the app, the data will be there.
>> >
>> >         We have received concerns saying this is impacting namenode
>> > performance since these small files are stored in HDFS. So we manually
>> > remove these checkpointed data at regular intervals.
>> >
>> > -Venkatesh
>> >
>> > -----Original Message-----
>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>> > Sent: Monday, February 01, 2016 7:49 AM
>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
>> > Subject: Re: Possibility of saving checkpoints on other distributed
>> > filesystems
>> >
>> > Aniruddha,
>> > We have not heard this request from users yet. It may be because our
>> > checkpointing has a purge, i.e. the small files are not left over. Small
>> > file problem has been there in Hadoop and relates to storing small
>> files in
>> > Hadoop for a longer time (more likely forever).
>> >
>> > Thks,
>> > Amol
>> >
>> >
>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>> > aniruddha@datatorrent.com> wrote:
>> >
>> > > Hi Community,
>> > >
>> > > Or Let me say BigFoots, do you think this feature should be available?
>> > >
>> > > The reason to bring this up was discussed in the start of this thread
>> as:
>> > >
>> > > This is with the intention to recover the applications faster and do
>> > > away
>> > > > with HDFS's small files problem as described here:
>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>> > > >
>> > > >
>> > >
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > > l-files-problem/
>> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > > > If we could save checkpoints in some other distributed file system
>> > > > (or even a HA NAS box) geared for small files, we could achieve -
>> > > >
>> > > >    - Better performance of NN & HDFS for the production usage (read:
>> > > >    production data I/O & not temp files)
>> > > >
>> > > >
>> > > >    - Faster application recovery in case of planned shutdown /
>> > unplanned
>> > > >    restarts
>> > > >
>> > > > If you feel the need of this feature, please cast your opinions and
>> > > > ideas
>> > > so that it can be converted in a jira.
>> > >
>> > >
>> > >
>> > > Thanks,
>> > >
>> > >
>> > > Aniruddha
>> > >
>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>> > > <ga...@datatorrent.com>
>> > > wrote:
>> > >
>> > > > Aniruddha,
>> > > >
>> > > > Currently we don't have any support for that.
>> > > >
>> > > > Thanks
>> > > > Gaurav
>> > > >
>> > > > Thanks
>> > > > -Gaurav
>> > > >
>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>> > > > <tu...@datatorrent.com>
>> > > > wrote:
>> > > >
>> > > > > Default FSStorageAgent can be used as it can work with local
>> > > filesystem,
>> > > > > but I far as I know there is no support for specifying the
>> > > > > directory through xml file. by default it use the application
>> > directory on HDFS.
>> > > > >
>> > > > > Not sure If we could specify storage agent with its properties
>> > > > > through
>> > > > the
>> > > > > configuration at dag level.
>> > > > >
>> > > > > - Tushar.
>> > > > >
>> > > > >
>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>> > > > > aniruddha@datatorrent.com> wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > Do we have any storage agent which I can use readily,
>> > > > > > configurable
>> > > > > through
>> > > > > > dt-site.xml?
>> > > > > >
>> > > > > > I am looking for something which would save checkpoints in
>> > > > > > mounted
>> > > file
>> > > > > > system [eg. HA-NAS] which is basically just another directory
>> > > > > > for
>> > > Apex.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > >
>> > > > > > Aniruddha
>> > > > > >
>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>> > > > sandesh@datatorrent.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > It is already supported refer the following jira for more
>> > > > information,
>> > > > > > >
>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>> > > > > > > aniruddha@datatorrent.com> wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > Is it possible to save checkpoints in any other highly
>> > > > > > > > available distributed file systems (which maybe mounted
>> > > > > > > > directories across
>> > > > the
>> > > > > > > > cluster) other than HDFS?
>> > > > > > > > If yes, is it configurable?
>> > > > > > > >
>> > > > > > > > AFAIK, there is no configurable option available to achieve
>> > that.
>> > > > > > > > If that's the case, can we have that feature?
>> > > > > > > >
>> > > > > > > > This is with the intention to recover the applications
>> > > > > > > > faster and
>> > > > do
>> > > > > > away
>> > > > > > > > with HDFS's small files problem as described here:
>> > > > > > > >
>> > > > > > > >
>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>> > > > > > > > m/
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > > l-files-problem/
>> > > > > > > >
>> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > > > > > > >
>> > > > > > > > If we could save checkpoints in some other distributed file
>> > > system
>> > > > > (or
>> > > > > > > even
>> > > > > > > > a HA NAS box) geared for small files, we could achieve -
>> > > > > > > >
>> > > > > > > >    - Better performance of NN & HDFS for the production
>> > > > > > > > usage
>> > > > (read:
>> > > > > > > >    production data I/O & not temp files)
>> > > > > > > >    - Faster application recovery in case of planned shutdown
>> > > > > > > > /
>> > > > > > unplanned
>> > > > > > > >    restarts
>> > > > > > > >
>> > > > > > > > Please, send your comments, suggestions or ideas.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Aniruddha
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Amol Kekre <am...@datatorrent.com>.

Can we include a script in our github (util?) that simply deletes these
files upon application being killed, given an app-id. The admin will need
to run this script. Auto-deleting will be bad as a lot of users, including
those in production today need to restart using those files. The
knowledge/desire to restart post failure is outside the app and hence
technically the script should be explicitly user invoked

Thks,
Amol


On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Hi Venkat,
>
> There are typically a small number of outstanding checkpoint files per
> operator, as newer checkpoints are created old ones are automatically
> deleted by the application when it determines that state is no longer
> needed. When an application stops/killed the last checkpoints remain. There
> is also a benefit to that since a new application can be restarted to
> continue from those checkpoints instead of starting all the way from the
> beginning and this is useful in some cases. But if you are always starting
> your application from scratch yes you can delete the checkpoints of older
> applications that are no longer running.
>
> Thanks
>
> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
> VKottapalli@directv.com> wrote:
>
> > Hi,
> >
> >         Now that this has been discussed, Will the checkpointed data be
> > purged when we kill the application forcefully?  In our current usage, we
> > forcefully kill the app after it processes a certain batch of data. I see
> > these small files are created under (user/datatorrent) directory and not
> > removed.
> >
> >         Another scenario, when some of the containers keep failing, we
> > have observed this state where the data is continuously checkpointed into
> > small files. When we kill the app, the data will be there.
> >
> >         We have received concerns saying this is impacting namenode
> > performance since these small files are stored in HDFS. So we manually
> > remove these checkpointed data at regular intervals.
> >
> > -Venkatesh
> >
> > -----Original Message-----
> > From: Amol Kekre [mailto:amol@datatorrent.com]
> > Sent: Monday, February 01, 2016 7:49 AM
> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
> > Subject: Re: Possibility of saving checkpoints on other distributed
> > filesystems
> >
> > Aniruddha,
> > We have not heard this request from users yet. It may be because our
> > checkpointing has a purge, i.e. the small files are not left over. Small
> > file problem has been there in Hadoop and relates to storing small files
> in
> > Hadoop for a longer time (more likely forever).
> >
> > Thks,
> > Amol
> >
> >
> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> > aniruddha@datatorrent.com> wrote:
> >
> > > Hi Community,
> > >
> > > Or Let me say BigFoots, do you think this feature should be available?
> > >
> > > The reason to bring this up was discussed in the start of this thread
> as:
> > >
> > > This is with the intention to recover the applications faster and do
> > > away
> > > > with HDFS's small files problem as described here:
> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > > >
> > > >
> > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > > l-files-problem/
> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > If we could save checkpoints in some other distributed file system
> > > > (or even a HA NAS box) geared for small files, we could achieve -
> > > >
> > > >    - Better performance of NN & HDFS for the production usage (read:
> > > >    production data I/O & not temp files)
> > > >
> > > >
> > > >    - Faster application recovery in case of planned shutdown /
> > unplanned
> > > >    restarts
> > > >
> > > > If you feel the need of this feature, please cast your opinions and
> > > > ideas
> > > so that it can be converted in a jira.
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Aniruddha
> > >
> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> > > <ga...@datatorrent.com>
> > > wrote:
> > >
> > > > Aniruddha,
> > > >
> > > > Currently we don't have any support for that.
> > > >
> > > > Thanks
> > > > Gaurav
> > > >
> > > > Thanks
> > > > -Gaurav
> > > >
> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> > > > <tu...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Default FSStorageAgent can be used as it can work with local
> > > filesystem,
> > > > > but I far as I know there is no support for specifying the
> > > > > directory through xml file. by default it use the application
> > directory on HDFS.
> > > > >
> > > > > Not sure If we could specify storage agent with its properties
> > > > > through
> > > > the
> > > > > configuration at dag level.
> > > > >
> > > > > - Tushar.
> > > > >
> > > > >
> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > > > > aniruddha@datatorrent.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Do we have any storage agent which I can use readily,
> > > > > > configurable
> > > > > through
> > > > > > dt-site.xml?
> > > > > >
> > > > > > I am looking for something which would save checkpoints in
> > > > > > mounted
> > > file
> > > > > > system [eg. HA-NAS] which is basically just another directory
> > > > > > for
> > > Apex.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > Aniruddha
> > > > > >
> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > > > sandesh@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > It is already supported refer the following jira for more
> > > > information,
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > > > > aniruddha@datatorrent.com> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Is it possible to save checkpoints in any other highly
> > > > > > > > available distributed file systems (which maybe mounted
> > > > > > > > directories across
> > > > the
> > > > > > > > cluster) other than HDFS?
> > > > > > > > If yes, is it configurable?
> > > > > > > >
> > > > > > > > AFAIK, there is no configurable option available to achieve
> > that.
> > > > > > > > If that's the case, can we have that feature?
> > > > > > > >
> > > > > > > > This is with the intention to recover the applications
> > > > > > > > faster and
> > > > do
> > > > > > away
> > > > > > > > with HDFS's small files problem as described here:
> > > > > > > >
> > > > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > > > > > > > m/
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > > l-files-problem/
> > > > > > > >
> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > > > >
> > > > > > > > If we could save checkpoints in some other distributed file
> > > system
> > > > > (or
> > > > > > > even
> > > > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > > > >
> > > > > > > >    - Better performance of NN & HDFS for the production
> > > > > > > > usage
> > > > (read:
> > > > > > > >    production data I/O & not temp files)
> > > > > > > >    - Faster application recovery in case of planned shutdown
> > > > > > > > /
> > > > > > unplanned
> > > > > > > >    restarts
> > > > > > > >
> > > > > > > > Please, send your comments, suggestions or ideas.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > >
> > > > > > > > Aniruddha
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Amol Kekre <am...@datatorrent.com>.

Can we include a script in our github (util?) that simply deletes these
files upon application being killed, given an app-id. The admin will need
to run this script. Auto-deleting will be bad as a lot of users, including
those in production today need to restart using those files. The
knowledge/desire to restart post failure is outside the app and hence
technically the script should be explicitly user invoked

Thks,
Amol


On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Hi Venkat,
>
> There are typically a small number of outstanding checkpoint files per
> operator, as newer checkpoints are created old ones are automatically
> deleted by the application when it determines that state is no longer
> needed. When an application stops/killed the last checkpoints remain. There
> is also a benefit to that since a new application can be restarted to
> continue from those checkpoints instead of starting all the way from the
> beginning and this is useful in some cases. But if you are always starting
> your application from scratch yes you can delete the checkpoints of older
> applications that are no longer running.
>
> Thanks
>
> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
> VKottapalli@directv.com> wrote:
>
> > Hi,
> >
> >         Now that this has been discussed, Will the checkpointed data be
> > purged when we kill the application forcefully?  In our current usage, we
> > forcefully kill the app after it processes a certain batch of data. I see
> > these small files are created under (user/datatorrent) directory and not
> > removed.
> >
> >         Another scenario, when some of the containers keep failing, we
> > have observed this state where the data is continuously checkpointed into
> > small files. When we kill the app, the data will be there.
> >
> >         We have received concerns saying this is impacting namenode
> > performance since these small files are stored in HDFS. So we manually
> > remove these checkpointed data at regular intervals.
> >
> > -Venkatesh
> >
> > -----Original Message-----
> > From: Amol Kekre [mailto:amol@datatorrent.com]
> > Sent: Monday, February 01, 2016 7:49 AM
> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
> > Subject: Re: Possibility of saving checkpoints on other distributed
> > filesystems
> >
> > Aniruddha,
> > We have not heard this request from users yet. It may be because our
> > checkpointing has a purge, i.e. the small files are not left over. Small
> > file problem has been there in Hadoop and relates to storing small files
> in
> > Hadoop for a longer time (more likely forever).
> >
> > Thks,
> > Amol
> >
> >
> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> > aniruddha@datatorrent.com> wrote:
> >
> > > Hi Community,
> > >
> > > Or Let me say BigFoots, do you think this feature should be available?
> > >
> > > The reason to bring this up was discussed in the start of this thread
> as:
> > >
> > > This is with the intention to recover the applications faster and do
> > > away
> > > > with HDFS's small files problem as described here:
> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > > >
> > > >
> > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > > l-files-problem/
> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > If we could save checkpoints in some other distributed file system
> > > > (or even a HA NAS box) geared for small files, we could achieve -
> > > >
> > > >    - Better performance of NN & HDFS for the production usage (read:
> > > >    production data I/O & not temp files)
> > > >
> > > >
> > > >    - Faster application recovery in case of planned shutdown /
> > unplanned
> > > >    restarts
> > > >
> > > > If you feel the need of this feature, please cast your opinions and
> > > > ideas
> > > so that it can be converted in a jira.
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Aniruddha
> > >
> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> > > <ga...@datatorrent.com>
> > > wrote:
> > >
> > > > Aniruddha,
> > > >
> > > > Currently we don't have any support for that.
> > > >
> > > > Thanks
> > > > Gaurav
> > > >
> > > > Thanks
> > > > -Gaurav
> > > >
> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> > > > <tu...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Default FSStorageAgent can be used as it can work with local
> > > filesystem,
> > > > > but I far as I know there is no support for specifying the
> > > > > directory through xml file. by default it use the application
> > directory on HDFS.
> > > > >
> > > > > Not sure If we could specify storage agent with its properties
> > > > > through
> > > > the
> > > > > configuration at dag level.
> > > > >
> > > > > - Tushar.
> > > > >
> > > > >
> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > > > > aniruddha@datatorrent.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Do we have any storage agent which I can use readily,
> > > > > > configurable
> > > > > through
> > > > > > dt-site.xml?
> > > > > >
> > > > > > I am looking for something which would save checkpoints in
> > > > > > mounted
> > > file
> > > > > > system [eg. HA-NAS] which is basically just another directory
> > > > > > for
> > > Apex.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > Aniruddha
> > > > > >
> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > > > sandesh@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > It is already supported refer the following jira for more
> > > > information,
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > > > > aniruddha@datatorrent.com> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Is it possible to save checkpoints in any other highly
> > > > > > > > available distributed file systems (which maybe mounted
> > > > > > > > directories across
> > > > the
> > > > > > > > cluster) other than HDFS?
> > > > > > > > If yes, is it configurable?
> > > > > > > >
> > > > > > > > AFAIK, there is no configurable option available to achieve
> > that.
> > > > > > > > If that's the case, can we have that feature?
> > > > > > > >
> > > > > > > > This is with the intention to recover the applications
> > > > > > > > faster and
> > > > do
> > > > > > away
> > > > > > > > with HDFS's small files problem as described here:
> > > > > > > >
> > > > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > > > > > > > m/
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > > l-files-problem/
> > > > > > > >
> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > > > >
> > > > > > > > If we could save checkpoints in some other distributed file
> > > system
> > > > > (or
> > > > > > > even
> > > > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > > > >
> > > > > > > >    - Better performance of NN & HDFS for the production
> > > > > > > > usage
> > > > (read:
> > > > > > > >    production data I/O & not temp files)
> > > > > > > >    - Faster application recovery in case of planned shutdown
> > > > > > > > /
> > > > > > unplanned
> > > > > > > >    restarts
> > > > > > > >
> > > > > > > > Please, send your comments, suggestions or ideas.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > >
> > > > > > > > Aniruddha
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Hi Venkat,

There are typically a small number of outstanding checkpoint files per
operator, as newer checkpoints are created old ones are automatically
deleted by the application when it determines that state is no longer
needed. When an application stops/killed the last checkpoints remain. There
is also a benefit to that since a new application can be restarted to
continue from those checkpoints instead of starting all the way from the
beginning and this is useful in some cases. But if you are always starting
your application from scratch yes you can delete the checkpoints of older
applications that are no longer running.

Thanks

On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
VKottapalli@directv.com> wrote:

> Hi,
>
>         Now that this has been discussed, Will the checkpointed data be
> purged when we kill the application forcefully?  In our current usage, we
> forcefully kill the app after it processes a certain batch of data. I see
> these small files are created under (user/datatorrent) directory and not
> removed.
>
>         Another scenario, when some of the containers keep failing, we
> have observed this state where the data is continuously checkpointed into
> small files. When we kill the app, the data will be there.
>
>         We have received concerns saying this is impacting namenode
> performance since these small files are stored in HDFS. So we manually
> remove these checkpointed data at regular intervals.
>
> -Venkatesh
>
> -----Original Message-----
> From: Amol Kekre [mailto:amol@datatorrent.com]
> Sent: Monday, February 01, 2016 7:49 AM
> To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
> Subject: Re: Possibility of saving checkpoints on other distributed
> filesystems
>
> Aniruddha,
> We have not heard this request from users yet. It may be because our
> checkpointing has a purge, i.e. the small files are not left over. Small
> file problem has been there in Hadoop and relates to storing small files in
> Hadoop for a longer time (more likely forever).
>
> Thks,
> Amol
>
>
> On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> aniruddha@datatorrent.com> wrote:
>
> > Hi Community,
> >
> > Or Let me say BigFoots, do you think this feature should be available?
> >
> > The reason to bring this up was discussed in the start of this thread as:
> >
> > This is with the intention to recover the applications faster and do
> > away
> > > with HDFS's small files problem as described here:
> > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > >
> > >
> > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > l-files-problem/
> > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > If we could save checkpoints in some other distributed file system
> > > (or even a HA NAS box) geared for small files, we could achieve -
> > >
> > >    - Better performance of NN & HDFS for the production usage (read:
> > >    production data I/O & not temp files)
> > >
> > >
> > >    - Faster application recovery in case of planned shutdown /
> unplanned
> > >    restarts
> > >
> > > If you feel the need of this feature, please cast your opinions and
> > > ideas
> > so that it can be converted in a jira.
> >
> >
> >
> > Thanks,
> >
> >
> > Aniruddha
> >
> > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> > <ga...@datatorrent.com>
> > wrote:
> >
> > > Aniruddha,
> > >
> > > Currently we don't have any support for that.
> > >
> > > Thanks
> > > Gaurav
> > >
> > > Thanks
> > > -Gaurav
> > >
> > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> > > <tu...@datatorrent.com>
> > > wrote:
> > >
> > > > Default FSStorageAgent can be used as it can work with local
> > filesystem,
> > > > but I far as I know there is no support for specifying the
> > > > directory through xml file. by default it use the application
> directory on HDFS.
> > > >
> > > > Not sure If we could specify storage agent with its properties
> > > > through
> > > the
> > > > configuration at dag level.
> > > >
> > > > - Tushar.
> > > >
> > > >
> > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > > > aniruddha@datatorrent.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Do we have any storage agent which I can use readily,
> > > > > configurable
> > > > through
> > > > > dt-site.xml?
> > > > >
> > > > > I am looking for something which would save checkpoints in
> > > > > mounted
> > file
> > > > > system [eg. HA-NAS] which is basically just another directory
> > > > > for
> > Apex.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > Aniruddha
> > > > >
> > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > > sandesh@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > It is already supported refer the following jira for more
> > > information,
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > > > aniruddha@datatorrent.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Is it possible to save checkpoints in any other highly
> > > > > > > available distributed file systems (which maybe mounted
> > > > > > > directories across
> > > the
> > > > > > > cluster) other than HDFS?
> > > > > > > If yes, is it configurable?
> > > > > > >
> > > > > > > AFAIK, there is no configurable option available to achieve
> that.
> > > > > > > If that's the case, can we have that feature?
> > > > > > >
> > > > > > > This is with the intention to recover the applications
> > > > > > > faster and
> > > do
> > > > > away
> > > > > > > with HDFS's small files problem as described here:
> > > > > > >
> > > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > > > > > > m/
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > l-files-problem/
> > > > > > >
> > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > > >
> > > > > > > If we could save checkpoints in some other distributed file
> > system
> > > > (or
> > > > > > even
> > > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > > >
> > > > > > >    - Better performance of NN & HDFS for the production
> > > > > > > usage
> > > (read:
> > > > > > >    production data I/O & not temp files)
> > > > > > >    - Faster application recovery in case of planned shutdown
> > > > > > > /
> > > > > unplanned
> > > > > > >    restarts
> > > > > > >
> > > > > > > Please, send your comments, suggestions or ideas.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > >
> > > > > > > Aniruddha
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Hi Venkat,

There are typically a small number of outstanding checkpoint files per
operator, as newer checkpoints are created old ones are automatically
deleted by the application when it determines that state is no longer
needed. When an application stops/killed the last checkpoints remain. There
is also a benefit to that since a new application can be restarted to
continue from those checkpoints instead of starting all the way from the
beginning and this is useful in some cases. But if you are always starting
your application from scratch yes you can delete the checkpoints of older
applications that are no longer running.

Thanks

On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
VKottapalli@directv.com> wrote:

> Hi,
>
>         Now that this has been discussed, Will the checkpointed data be
> purged when we kill the application forcefully?  In our current usage, we
> forcefully kill the app after it processes a certain batch of data. I see
> these small files are created under (user/datatorrent) directory and not
> removed.
>
>         Another scenario, when some of the containers keep failing, we
> have observed this state where the data is continuously checkpointed into
> small files. When we kill the app, the data will be there.
>
>         We have received concerns saying this is impacting namenode
> performance since these small files are stored in HDFS. So we manually
> remove these checkpointed data at regular intervals.
>
> -Venkatesh
>
> -----Original Message-----
> From: Amol Kekre [mailto:amol@datatorrent.com]
> Sent: Monday, February 01, 2016 7:49 AM
> To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
> Subject: Re: Possibility of saving checkpoints on other distributed
> filesystems
>
> Aniruddha,
> We have not heard this request from users yet. It may be because our
> checkpointing has a purge, i.e. the small files are not left over. Small
> file problem has been there in Hadoop and relates to storing small files in
> Hadoop for a longer time (more likely forever).
>
> Thks,
> Amol
>
>
> On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> aniruddha@datatorrent.com> wrote:
>
> > Hi Community,
> >
> > Or Let me say BigFoots, do you think this feature should be available?
> >
> > The reason to bring this up was discussed in the start of this thread as:
> >
> > This is with the intention to recover the applications faster and do
> > away
> > > with HDFS's small files problem as described here:
> > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > >
> > >
> > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > l-files-problem/
> > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > If we could save checkpoints in some other distributed file system
> > > (or even a HA NAS box) geared for small files, we could achieve -
> > >
> > >    - Better performance of NN & HDFS for the production usage (read:
> > >    production data I/O & not temp files)
> > >
> > >
> > >    - Faster application recovery in case of planned shutdown /
> unplanned
> > >    restarts
> > >
> > > If you feel the need of this feature, please cast your opinions and
> > > ideas
> > so that it can be converted in a jira.
> >
> >
> >
> > Thanks,
> >
> >
> > Aniruddha
> >
> > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> > <ga...@datatorrent.com>
> > wrote:
> >
> > > Aniruddha,
> > >
> > > Currently we don't have any support for that.
> > >
> > > Thanks
> > > Gaurav
> > >
> > > Thanks
> > > -Gaurav
> > >
> > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> > > <tu...@datatorrent.com>
> > > wrote:
> > >
> > > > Default FSStorageAgent can be used as it can work with local
> > filesystem,
> > > > but I far as I know there is no support for specifying the
> > > > directory through xml file. by default it use the application
> directory on HDFS.
> > > >
> > > > Not sure If we could specify storage agent with its properties
> > > > through
> > > the
> > > > configuration at dag level.
> > > >
> > > > - Tushar.
> > > >
> > > >
> > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > > > aniruddha@datatorrent.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Do we have any storage agent which I can use readily,
> > > > > configurable
> > > > through
> > > > > dt-site.xml?
> > > > >
> > > > > I am looking for something which would save checkpoints in
> > > > > mounted
> > file
> > > > > system [eg. HA-NAS] which is basically just another directory
> > > > > for
> > Apex.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > Aniruddha
> > > > >
> > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > > sandesh@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > It is already supported refer the following jira for more
> > > information,
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > > > aniruddha@datatorrent.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Is it possible to save checkpoints in any other highly
> > > > > > > available distributed file systems (which maybe mounted
> > > > > > > directories across
> > > the
> > > > > > > cluster) other than HDFS?
> > > > > > > If yes, is it configurable?
> > > > > > >
> > > > > > > AFAIK, there is no configurable option available to achieve
> that.
> > > > > > > If that's the case, can we have that feature?
> > > > > > >
> > > > > > > This is with the intention to recover the applications
> > > > > > > faster and
> > > do
> > > > > away
> > > > > > > with HDFS's small files problem as described here:
> > > > > > >
> > > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > > > > > > m/
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > l-files-problem/
> > > > > > >
> > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > > >
> > > > > > > If we could save checkpoints in some other distributed file
> > system
> > > > (or
> > > > > > even
> > > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > > >
> > > > > > >    - Better performance of NN & HDFS for the production
> > > > > > > usage
> > > (read:
> > > > > > >    production data I/O & not temp files)
> > > > > > >    - Faster application recovery in case of planned shutdown
> > > > > > > /
> > > > > unplanned
> > > > > > >    restarts
> > > > > > >
> > > > > > > Please, send your comments, suggestions or ideas.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > >
> > > > > > > Aniruddha
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

RE: Possibility of saving checkpoints on other distributed filesystems

Posted by "Kottapalli, Venkatesh" <VK...@DIRECTV.com>.

Hi,

	Now that this has been discussed, Will the checkpointed data be purged when we kill the application forcefully?  In our current usage, we forcefully kill the app after it processes a certain batch of data. I see these small files are created under (user/datatorrent) directory and not removed.

	Another scenario, when some of the containers keep failing, we have observed this state where the data is continuously checkpointed into small files. When we kill the app, the data will be there. 

	We have received concerns saying this is impacting namenode performance since these small files are stored in HDFS. So we manually remove these checkpointed data at regular intervals.

-Venkatesh

-----Original Message-----
From: Amol Kekre [mailto:amol@datatorrent.com] 
Sent: Monday, February 01, 2016 7:49 AM
To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
Subject: Re: Possibility of saving checkpoints on other distributed filesystems

Aniruddha,
We have not heard this request from users yet. It may be because our checkpointing has a purge, i.e. the small files are not left over. Small file problem has been there in Hadoop and relates to storing small files in Hadoop for a longer time (more likely forever).

Thks,
Amol


On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare < aniruddha@datatorrent.com> wrote:

> Hi Community,
>
> Or Let me say BigFoots, do you think this feature should be available?
>
> The reason to bring this up was discussed in the start of this thread as:
>
> This is with the intention to recover the applications faster and do 
> away
> > with HDFS's small files problem as described here:
> > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> l-files-problem/
> > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > If we could save checkpoints in some other distributed file system 
> > (or even a HA NAS box) geared for small files, we could achieve -
> >
> >    - Better performance of NN & HDFS for the production usage (read:
> >    production data I/O & not temp files)
> >
> >
> >    - Faster application recovery in case of planned shutdown / unplanned
> >    restarts
> >
> > If you feel the need of this feature, please cast your opinions and 
> > ideas
> so that it can be converted in a jira.
>
>
>
> Thanks,
>
>
> Aniruddha
>
> On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta 
> <ga...@datatorrent.com>
> wrote:
>
> > Aniruddha,
> >
> > Currently we don't have any support for that.
> >
> > Thanks
> > Gaurav
> >
> > Thanks
> > -Gaurav
> >
> > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi 
> > <tu...@datatorrent.com>
> > wrote:
> >
> > > Default FSStorageAgent can be used as it can work with local
> filesystem,
> > > but I far as I know there is no support for specifying the 
> > > directory through xml file. by default it use the application directory on HDFS.
> > >
> > > Not sure If we could specify storage agent with its properties 
> > > through
> > the
> > > configuration at dag level.
> > >
> > > - Tushar.
> > >
> > >
> > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare < 
> > > aniruddha@datatorrent.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Do we have any storage agent which I can use readily, 
> > > > configurable
> > > through
> > > > dt-site.xml?
> > > >
> > > > I am looking for something which would save checkpoints in 
> > > > mounted
> file
> > > > system [eg. HA-NAS] which is basically just another directory 
> > > > for
> Apex.
> > > >
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > > Aniruddha
> > > >
> > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > sandesh@datatorrent.com>
> > > > wrote:
> > > >
> > > > > It is already supported refer the following jira for more
> > information,
> > > > >
> > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare < 
> > > > > aniruddha@datatorrent.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is it possible to save checkpoints in any other highly 
> > > > > > available distributed file systems (which maybe mounted 
> > > > > > directories across
> > the
> > > > > > cluster) other than HDFS?
> > > > > > If yes, is it configurable?
> > > > > >
> > > > > > AFAIK, there is no configurable option available to achieve that.
> > > > > > If that's the case, can we have that feature?
> > > > > >
> > > > > > This is with the intention to recover the applications 
> > > > > > faster and
> > do
> > > > away
> > > > > > with HDFS's small files problem as described here:
> > > > > >
> > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > > > > > m/
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> l-files-problem/
> > > > > >
> > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > >
> > > > > > If we could save checkpoints in some other distributed file
> system
> > > (or
> > > > > even
> > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > >
> > > > > >    - Better performance of NN & HDFS for the production 
> > > > > > usage
> > (read:
> > > > > >    production data I/O & not temp files)
> > > > > >    - Faster application recovery in case of planned shutdown 
> > > > > > /
> > > > unplanned
> > > > > >    restarts
> > > > > >
> > > > > > Please, send your comments, suggestions or ideas.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > Aniruddha
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Amol Kekre <am...@datatorrent.com>.

Aniruddha,
We have not heard this request from users yet. It may be because our
checkpointing has a purge, i.e. the small files are not left over. Small
file problem has been there in Hadoop and relates to storing small files in
Hadoop for a longer time (more likely forever).

Thks,
Amol


On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
aniruddha@datatorrent.com> wrote:

> Hi Community,
>
> Or Let me say BigFoots, do you think this feature should be available?
>
> The reason to bring this up was discussed in the start of this thread as:
>
> This is with the intention to recover the applications faster and do away
> > with HDFS's small files problem as described here:
> > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
> > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > If we could save checkpoints in some other distributed file system (or
> > even a HA NAS box) geared for small files, we could achieve -
> >
> >    - Better performance of NN & HDFS for the production usage (read:
> >    production data I/O & not temp files)
> >
> >
> >    - Faster application recovery in case of planned shutdown / unplanned
> >    restarts
> >
> > If you feel the need of this feature, please cast your opinions and ideas
> so that it can be converted in a jira.
>
>
>
> Thanks,
>
>
> Aniruddha
>
> On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta <ga...@datatorrent.com>
> wrote:
>
> > Aniruddha,
> >
> > Currently we don't have any support for that.
> >
> > Thanks
> > Gaurav
> >
> > Thanks
> > -Gaurav
> >
> > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi <tu...@datatorrent.com>
> > wrote:
> >
> > > Default FSStorageAgent can be used as it can work with local
> filesystem,
> > > but I far as I know there is no support for specifying the directory
> > > through xml file. by default it use the application directory on HDFS.
> > >
> > > Not sure If we could specify storage agent with its properties through
> > the
> > > configuration at dag level.
> > >
> > > - Tushar.
> > >
> > >
> > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > > aniruddha@datatorrent.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Do we have any storage agent which I can use readily, configurable
> > > through
> > > > dt-site.xml?
> > > >
> > > > I am looking for something which would save checkpoints in mounted
> file
> > > > system [eg. HA-NAS] which is basically just another directory for
> Apex.
> > > >
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > > Aniruddha
> > > >
> > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > sandesh@datatorrent.com>
> > > > wrote:
> > > >
> > > > > It is already supported refer the following jira for more
> > information,
> > > > >
> > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > > aniruddha@datatorrent.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is it possible to save checkpoints in any other highly available
> > > > > > distributed file systems (which maybe mounted directories across
> > the
> > > > > > cluster) other than HDFS?
> > > > > > If yes, is it configurable?
> > > > > >
> > > > > > AFAIK, there is no configurable option available to achieve that.
> > > > > > If that's the case, can we have that feature?
> > > > > >
> > > > > > This is with the intention to recover the applications faster and
> > do
> > > > away
> > > > > > with HDFS's small files problem as described here:
> > > > > >
> > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
> > > > > >
> > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > >
> > > > > > If we could save checkpoints in some other distributed file
> system
> > > (or
> > > > > even
> > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > >
> > > > > >    - Better performance of NN & HDFS for the production usage
> > (read:
> > > > > >    production data I/O & not temp files)
> > > > > >    - Faster application recovery in case of planned shutdown /
> > > > unplanned
> > > > > >    restarts
> > > > > >
> > > > > > Please, send your comments, suggestions or ideas.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > Aniruddha
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Posted by Amol Kekre <am...@datatorrent.com>.

Aniruddha,
We have not heard this request from users yet. It may be because our
checkpointing has a purge, i.e. the small files are not left over. Small
file problem has been there in Hadoop and relates to storing small files in
Hadoop for a longer time (more likely forever).

Thks,
Amol


On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
aniruddha@datatorrent.com> wrote:

> Hi Community,
>
> Or Let me say BigFoots, do you think this feature should be available?
>
> The reason to bring this up was discussed in the start of this thread as:
>
> This is with the intention to recover the applications faster and do away
> > with HDFS's small files problem as described here:
> > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
> > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > If we could save checkpoints in some other distributed file system (or
> > even a HA NAS box) geared for small files, we could achieve -
> >
> >    - Better performance of NN & HDFS for the production usage (read:
> >    production data I/O & not temp files)
> >
> >
> >    - Faster application recovery in case of planned shutdown / unplanned
> >    restarts
> >
> > If you feel the need of this feature, please cast your opinions and ideas
> so that it can be converted in a jira.
>
>
>
> Thanks,
>
>
> Aniruddha
>
> On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta <ga...@datatorrent.com>
> wrote:
>
> > Aniruddha,
> >
> > Currently we don't have any support for that.
> >
> > Thanks
> > Gaurav
> >
> > Thanks
> > -Gaurav
> >
> > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi <tu...@datatorrent.com>
> > wrote:
> >
> > > Default FSStorageAgent can be used as it can work with local
> filesystem,
> > > but I far as I know there is no support for specifying the directory
> > > through xml file. by default it use the application directory on HDFS.
> > >
> > > Not sure If we could specify storage agent with its properties through
> > the
> > > configuration at dag level.
> > >
> > > - Tushar.
> > >
> > >
> > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > > aniruddha@datatorrent.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Do we have any storage agent which I can use readily, configurable
> > > through
> > > > dt-site.xml?
> > > >
> > > > I am looking for something which would save checkpoints in mounted
> file
> > > > system [eg. HA-NAS] which is basically just another directory for
> Apex.
> > > >
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > > Aniruddha
> > > >
> > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > sandesh@datatorrent.com>
> > > > wrote:
> > > >
> > > > > It is already supported refer the following jira for more
> > information,
> > > > >
> > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > > aniruddha@datatorrent.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is it possible to save checkpoints in any other highly available
> > > > > > distributed file systems (which maybe mounted directories across
> > the
> > > > > > cluster) other than HDFS?
> > > > > > If yes, is it configurable?
> > > > > >
> > > > > > AFAIK, there is no configurable option available to achieve that.
> > > > > > If that's the case, can we have that feature?
> > > > > >
> > > > > > This is with the intention to recover the applications faster and
> > do
> > > > away
> > > > > > with HDFS's small files problem as described here:
> > > > > >
> > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
> > > > > >
> > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > >
> > > > > > If we could save checkpoints in some other distributed file
> system
> > > (or
> > > > > even
> > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > >
> > > > > >    - Better performance of NN & HDFS for the production usage
> > (read:
> > > > > >    production data I/O & not temp files)
> > > > > >    - Faster application recovery in case of planned shutdown /
> > > > unplanned
> > > > > >    restarts
> > > > > >
> > > > > > Please, send your comments, suggestions or ideas.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > Aniruddha
> > > > > >
> > > > >
> > > >
> > >
> >
>